Correlation is a bivariate measure of association (that is, of effect size or strength) of the relationship between two variables. It varies from 0 (random relationship) to 1 (perfect linear relationship) or -1 (perfect negative linear relationship). It is usually reported in terms of its square (r2), interpreted as percent of variance explained. For instance, if r2 is .25, then the independent variable is said to explain 25% of the variance in the dependent variable.

There are several common pitfalls when using correlation. Correlation is symmetrical, not providing evidence of which way causation flows. If unmeasured variables also cause the dependent variable, then any covariance they share with the given independent variable in a correlation may be falsely attributed to that independent variable. Also, to the extent that there is a nonlinear relationship between the two variables being correlated, correlation will understate the relationship. Correlation will also be attenuated to the extent there is measurement error, including use of sub-interval data or artificial truncation of the range of the data. Correlation can also be a misleading average if the relationship varies depending on the value of the independent variable ("lack of homoscedasticity"). And, of course, atheoretical or post-hoc running of many correlations runs the risk that 5% of the coefficients may be found significant by chance alone.

Beside Pearsonian correlation (r), by far the most common type, there are other special types of correlation to handle the special characteristics of such types of variables as dichotomies, and there are other measures of association for nominal and ordinal variables. Regression procedures produce multiple correlation, R, which is the correlation of multiple independent variables with a single dependent. Also, there is partial correlation, which is the correlation of one variable with another, controlling both the given variable and the dependent for a third or additional variables, used to model three to five variables. And there is part correlation, which is the correlation of one variable with another, controlling only the given variable for a third or additional variables. The b coefficients in regression are part (semi-partial) coefficients. These topics are discussed in separate volumes of the "Blue Book" series

The full content is now available from Statistical Associates Publishers. Click here.

Below is the unformatted table of contents.


Table of Contents

Key Concepts and Terms
Basic terms
Correlation for interval data
Pearsonian correlation
Pearson's r
Coefficient of determination, r2
Attenuation of correlation
Ordinal correlation
Correlation for ordinal and dichotomous data
Spearman's rho
Kendall's tau-b
Polyserial correlation
Polychoric correlation
Pearsonian and ordinal correlation in SPSS
SPSS correlation dialog
Pearson correlation output
Ordinal  correlation output
Pearsonian and ordinal correlation with SAS
SAS syntax
PLOTS output
Cronbach coefficient alpha table
Correlation for dichotomies
Point-biserial correlation
Biserial correlation
Converting point-biserial to biserial correlation
Rank biserial correlation
Other types of correlation
Tetrachoric correlation
Correlation ratio, eta
Coefficient of intraclass correlation (ICC)
Interval level data
Linear relationships
No outliers
Minimal measurement error
Unrestricted variance
Similar underlying distributions
Common underlying normal distributions
Normally distributed error terms
Frequently Asked Questions
Do I want one-tailed or two-tailed significance?
How many correlations will there be among k variables?
What rules exist for determining the appropriate significance level for testing correlation coefficients?
How do I convert correlations into z scores?
Z-Score Conversions of Pearson's r
How is the significance of a correlation coefficient computed?
Significance of r
Significance of the difference between two correlations from two independent samples
Significance of the difference between two dependent correlations from the same sample
How do I set confidence limits on my correlation coefficients?
I have ordinal variables and thus used Spearman's rho. How do I use these ordinal correlations in SPSS for partial correlation, regression, and other procedures?
What is the relation of correlation to ANOVA?
What is the relation of correlation to validity?
What is the SPSS syntax for correlation?