|
|
Overview
Of the measures in this section, phi, the contingency coefficient, Tschuprow's T, and Cramer's V are based on adjusting chi-square significance to factor out sample size. These measures do not lend themselves to easily expressable interpretation. The adjusted contingency coefficient C* and Cramer's V vary between 0 and 1 regardless of table size, whereas phi, C, and T do not. V is by far the most used measure of association for this subset. Lambda is a popular measure because of its easily-understood interpretation in terms of proportionate reduction in error (PRE) varying from 0 to 1, but it defines perfect association as predictive monotonicity and null relationship as accord, unlike most measures, which use the criteria of strict monotonicity and statistical independence, as discussed in the section on association. The Uncertainty Coefficient also has a PRE meaning but its formula takes into account the entire distribution rather than just the mode (which lambda uses) and therefore may be preferred to lambda..
|
|
Phi can handle larger tables but is often used as a measure of association in 2-by-2 tables formed by true dichotomies. With dichotomized continuous data, tetrachoric correlation is preferred. Phi is the geometric mean of percent difference across rows and percent difference across columns when used with 2-by-2 tables. That is, it can be interpreted as a symmetric version of percent difference when used with 2-by-2 tables. (Recall the geometric mean is a measure of central tendency computed by multiplying n values and taking the nth root of their product.). For larger tables, the maximum value of phi depends on table size and can exceed 1.0, making phi a not particularly suitable measure of association.
Computationally, phi is the square root of chi-square divided by n, the sample size: phi = SQRT(X2/n). When computing phi, note that Yates' correction to chi-square is not used. Phi thus measures the strength of the relationship defined as the number of cases on one diagonal minus the number on the other diagonal, adjusting for the marginal distribution of the variables.
| Party/Vote | Democrat | Republican |
| Voted | 15 | 10 |
| Didn't Vote | 5 | 20 |
| City Size 1 | City Size 2 | City Size 3 | Row Totals | |
| City Manager | 80 | 9 | 1 | 90 |
| Mayor-Council | 40 | 1 | 9 | 50 |
| Column Totals | 120 | 10 | 10 | n = 140 |
lambda = [(80 + 9 + 9) - 90]/(140-90) = .16
In the table above, knowing city size reduces errors in guessing form of government by 16%. Errors made not knowing is based on subtracting the modal category of the dependent, 90, from n, which is 140: 140 - 90 = 50 errors when not knowing the independent. That is, if one did not know the distribution of the independent variable, then one would guess all cities were of the city manager type, and one would be right 90 times and wrong 50 times. The proportionate reduction in error (PRE) made when knowing the distribution of the independent is the sum of the modal categories of each column of the dependent: 80 + 9+ 9 = 98, minus the 90 correct guesses one would have made anyway (98 - 90 = 8), divided by the number of errors one would have made anyway: 8/50 = .16.
Since lambda has a known sampling distribution (it is asymptotically normal) it is possible to compute its standard error and significance. SPSS, SAS, and other major packages report the ASE (asymptotic standard error). Lambda divided by its ASE gives the coefficient used to compute plambda, the probability or significance level of the computed lambda value. The formula for the variance of lambda is given in SAS (1988: 533-534) and in Liebetrau (1983: 20-23).
The formula for UC(R|C), which is the uncertainty coefficient for predicting the row variable on the basis of the column variable, is given below, with its variants:
where
Copyright 1998, 2008 by G. David Garson.
Last updated 3/24/2008.