|
|
OverviewOverviewCorrespondence analysis is a method of factoring categorical variables and displaying them in a property space which maps their association in two or more dimensions. It is often used where a tabular approach is less effective due to large tables with many rows and/or columns. Though not limited to that arena, correspondence analysis been popular in marketing research, as to display such variables as customer color preference, size preference, and taste preference in relation to preferences for Brands A, B, and C. Correspondence analysis is a special case of canonical correlation, where one set of entities (categories rather than variables as in conventional canonical correlation) is related to another set.Correspondence analysis starts with tabular data, usually two-way cross-classifications, though the technique is generalizable to n-way tables with more than two variables. The variables must be discrete: nominal, ordinal, or continuous variables segmented into ranges. The technique defines a measure of distance between any two points, where points are the values (categories) of the discrete variables. Since distance is a type of measure of association (correlation), the distance matrix can be the input to principal components analysis, just as correlation matrices may be the input for conventional factor analysis. However, where conventional factor analysis determines which variables cluster together, correspondence analysis determines which category values are close together. This is visualized on the correspondence map, which plots points (categories) along the computed factor axes. Because the definition of point distance in correspondence analysis does not support significance testing, it is recommended that some other technique compatible with discrete data, such as log-linear modeling or logistic regression, be used to test alternative models. After selecting a best-fitting model using another technique, then correspondence analysis may be very useful in exploring relationships within that model. [Note: Click here for former PA 765 unit on correspondence analysis (not currently taught).]
|
|

Though symmetrical normalization is designed for this purpose, under any form of standardization one cannot precisely interpret the distance between a row point and a column point. Rather one must make a non-precise general statement, such as noting where particular row points and column points appear in the same map quadrant.
The sum of contributions of dimensions to points will add to 1.0 across all dimensions for a given point in the full solution where all possible dimensions are computed. However, the interpreted dimensions usually will sum to less than 1.0.
Note that high contribution of points to dimensions implies a high squared correlation, but the reverse is not true. That is, if a point explains a lot of the variance in a dimension, usually that dimension will also describe the point very well (high squared correlation). However, just because a dimensions describes a point well does not mean the point will necessarily be important in explaining the dimension.
Correspondence analysis is now supported by several programs, some of which are:
Note that since the average row profile element is used inversely (1/a(.j)), this makes categories with few observations (as reflected in lower average row profiles) contribute more to interpoint distances (because the divisor is smaller). For instance, if party id is columns and media type is rows, and if Libertarian is a small group, their small row profile elements are compensated by dividing by their small average row profile. The effect is to equalize the importance of the column categories, with Libertarians being as important as Democrats when comparing distances among media types.
Detrending removes the arch effect. This is done by dividing the map into a series of vertical partitions, thus dividing the map along the primary (horizontal) axis. Within each partition, that cluster of points is relocated to center on the second (vertical) axis's 0 point. This arbitrary adjustment of the data has been the subject of methodological criticism.
Rescaling is a second step in DCA. Where detrending realigned the points with respect to the secondary (vertical) axis, rescaling realigns the points along the primary (horizontal) axis as well as the vertical axis. Both axes are rescaled such that units represent standard deviations, seeking to make distance in ordination space mean the same thing along the axes of the map. Note that rescaling requires numeric (not nominal) measurement of points associated with the primary axis.
The effects of detrending and rescaling may remove the arch effect, remove compression at the ends of axes, and distances separating points are more easily interpreted. Detrending is common in ecological uses of correspondence analysis.
Multiple correspondence analysis (MCA) is the generalized extension of correspondence analysis to handle more than two variables. The input to MCA is a design matrix in which cases are rows and categories of variables are columns. Cell values in this matrix are 1's or 0's, depending on whether the case does or does not belong to the category. Interpretation of correspondence maps is similar to that for simple correspondence analysis.
Copyright 1998, 2008 by G. David Garson.
Last update, 3.34/2008.