This content is now available from Statistical Associates Publishers. Click here.
Below is the overview and table of contents in unformatted form.
Overview
Correlation is a bivariate measure of association (that is, of effect size or strength) of the relationship between two variables. It varies from 0 (random relationship) to 1 (perfect linear relationship) or -1 (perfect negative linear relationship). It is usually reported in terms of its square (r2), interpreted as percent of variance explained. For instance, if r2 is .25, then the independent variable is said to explain 25% of the variance in the dependent variable.
There are several common pitfalls when using correlation. Correlation is symmetrical, not providing evidence of which way causation flows. If unmeasured variables also cause the dependent variable, then any covariance they share with the given independent variable in a correlation may be falsely attributed to that independent variable. Also, to the extent that there is a nonlinear relationship between the two variables being correlated, correlation will understate the relationship. Correlation will also be attenuated to the extent there is measurement error, including use of sub-interval data or artificial truncation of the range of the data. Correlation can also be a misleading average if the relationship varies depending on the value of the independent variable ("lack of homoscedasticity"). And, of course, atheoretical or post-hoc running of many correlations runs the risk that 5% of the coefficients may be found significant by chance alone.
Beside Pearsonian correlation (r), by far the most common type, there are other special types of correlation to handle the special characteristics of such types of variables as dichotomies, and there are other measures of association for nominal and ordinal variables. Regression procedures produce multiple correlation, R, which is the correlation of multiple independent variables with a single dependent. Also, there is partial correlation, which is the correlation of one variable with another, controlling both the given variable and the dependent for a third or additional variables, used to model three to five variables. And there is part correlation, which is the correlation of one variable with another, controlling only the given variable for a third or additional variables. The b coefficients in regression are part (semi-partial) coefficients. These topics are discussed in separate volumes of the "blue book" series.
Table of Contents
Overview 6
Key Concepts and Terms 7
Basic terms 7
Deviation 7
Covariance 7
Standardization 7
Correlation 8
Correlation for interval data 8
Pearsonian correlation 8
Pearson's r 8
Coefficient of determination, r2 9
Attenuation of correlation 9
Ordinal correlation 14
Correlation for ordinal and dichotomous data 14
Spearman's rho 14
Kendall's tau-b 14
Polyserial correlation 14
Polychoric correlation 15
Pearsonian and ordinal correlation in SPSS 15
Example 15
SPSS correlation dialog 16
Options 17
Pearson correlation output 17
Ordinal correlation output 19
Pearsonian and ordinal correlation with SAS 20
Example 20
SAS syntax 20
SAS PROC CORR output 22
PLOTS output 25
Cronbach coefficient alpha table 27
Correlation for dichotomies 28
Point-biserial correlation 28
Biserial correlation 28
Converting point-biserial to biserial correlation 29
Rank biserial correlation 29
Phi 29
Other types of correlation 29
Tetrachoric correlation 29
Correlation ratio, eta 30
Coefficient of intraclass correlation (ICC) 30
Assumptions 32
Interval level data 32
Linear relationships 32
Homoscedasticity 32
No outliers 32
Minimal measurement error 32
Unrestricted variance 33
Similar underlying distributions 33
Common underlying normal distributions 33
Normally distributed error terms 34
Frequently Asked Questions 35
Do I want one-tailed or two-tailed significance? 35
How many correlations will there be among k variables? 35
What rules exist for determining the appropriate significance level for testing correlation coefficients? 35
How do I convert correlations into z scores? 36
Z-Score Conversions of Pearson's r 36
How is the significance of a correlation coefficient computed? 39
Significance of r 39
Significance of the difference between two correlations from two independent samples 39
Significance of the difference between two dependent correlations from the same sample 41
How do I set confidence limits on my correlation coefficients? 41
I have ordinal variables and thus used Spearman's rho. How do I use these ordinal correlations in SPSS for partial correlation, regression, and other procedures? 42
What is the relation of correlation to ANOVA? 42
What is the relation of correlation to validity? 42
What is the SPSS syntax for correlation? 43
Bibliography 45