|
|
Overview
Multiple discriminant analysis (MDA) is an extension of discriminant analysis and a cousin of multiple analysis of variance (MANOVA), sharing many of the same assumptions and tests. MDA is used to classify a categorical dependent which has more than two categories, using as predictors a number of interval or dummy independent variables. MDA is sometimes also called discriminant factor analysis or canonical discriminant analysis. There are several purposes for DA and/or MDA:
Discriminant analysis has two steps: (1) an F test (Wilks' lambda) is used to test if the discriminant model as a whole is significant, and (2) if the F test shows significance, then the individual independent variables are assessed to see which differ significantly in mean by group and these are used to classify the dependent variable. Discriminant analysis shares all the usual assumptions of correlation, requiring linear and homoscedastic relationships, and untruncated interval or near interval data. Like multiple regression, it also assumes proper model specification (inclusion of all important independents and exclusion of extraneous variables). DA also assumes the dependent variable is a true dichotomy since data which are forced into dichotomous coding are truncated, attenuating correlation. DA is an earlier alternative to logistic regression, which is now frequently used in place of DA as it usually involves fewer violations of assumptions (independent variables needn't be normally distributed, linearly related, or have equal within-group variances), is robust, handles categorical as well as continuous variables, and has coefficients which many find easier to interpret. Logistic regression is preferred when data are not normal in distribution or group sizes are very unequal. However, discriminant analysis is preferred when the assumptions of linear regression are met since then DA has more stattistical power than logistic regression (less chance of type 2 errors - accepting a false null hypothesis). See also the separate topic on multiple discriminant function analysis (MDA) for dependents with more than two categories.
|
|
The first function maximizes the differences between the values of the dependent variable. The second function is orthogonal to it (uncorrelated with it) and maximizes the differences between values of the dependent variable, controlling for the first factor. And so on. Though mathematically different, each discriminant function is a dimension which differentiates a case into categories of the dependent (here, religions) based on its values on the independents. The first function will be the most powerful differentiating dimension, but later functions may also represent additional significant dimensions of differentiation.
If one clicks the Statistics button in SPSS after running discriminant analysis and then checks "Unstandardized coefficients," then SPSS output will include the unstandardized discriminant coefficients.
As with regression, since these are partial coefficients, only the unique explanation of each independent is being compared, not considering any shared explanation. Also, if there are more than two groups of the dependent, the standardized discriminant coefficients do not tell the researcher between which groups the variable is most or least discriminating. For this purpose, group centroids and factor structure are examined. The standardized discriminant coefficients appear by default in SPSS (Analyze, Classify, Discriminant) in a table of "Standardized Canonical Discriminant Function Coefficients". In MDA, there will be as many sets of coefficients as there are discriminant functions (dimensions).
Stepwise Wilks' lambda also appears in the "Variables Not in the Analysis" table of stepwise DA output, after the "Sig. of F to Enter" column. Here the criterion is reversed: the variable with the lowest stepwise Wilks' lambda is the best candidate to add to the model in the next step.
Technically, structure coefficients are pooled within-groups correlations between the independent variables and the standardized canonical discriminant functions. When the dependent has more than two categories there will be more than one discriminant function. In that case, there will be multiple columns in the table, one for each function. The correlations then serve like factor loadings in factor analysis -- by considering the set of variables that load most heavily on a given dimension, the researcher may infer a suitable label for that dimension. The structure matrix correlations appear in SPSS output in the "Structure Matrix" table, produced by default under Analyze, Classify, Discriminant.
Thus for two-group DA, the structure coefficients show the order of importance of the discriminating variables by total correlation, whereas the standardized discriminant coefficients show the order of importance by unique contribution. The sign of the structure coefficient also shows the direction of the relationship. For multiple discriminant analysis, the structure coefficients additionally allow the researcher to see the relative importance of each independent variable on each dimension.
Large samples. Where sample size is large, even small differences in covariance matrices may be found significant by Box's M, when in fact no substantial problem of violation of assumptions exists. Therefore, the researcher should also look at the log determinants of the group covariance matrices, which are printed along with Box's M. If the group log determinants are similar, then a significant Box's M for a large sample is usually ignored. Dissimilar log determinants indicates violation of the assumption of equal variance covariance matrices, leading to greater classification errors (specifically, DA will tend to classify cases in the group with the larger variability). When violation occurs, quadratic DA may be used (not support by SPSS as of Version 13).
Dummy variables. As in regression, dummy variables must be assessed as a group, not on the basis of individual beta weights. This is done through hierarchical discriminant analysis, running the analysis first with, then without the set of dummies. The difference in the squared canonical correlation indicates the explanatory effect of the set of dummies.
Alternatively, for interval independents, one can correlate the discriminant function scores with the independents. The discriminating variables which matter the most to a particular function will be correlated highest with the DA scores.
In SPSS there are several available criteria for entering or removing new variables at each step: Wilks’ lambda, unexplained variance, Mahalanobis’ distance, smallest F ratio, and Rao’s V. The researcher typically sets the critical significance level by setting the "F to remove" in most statistical packages.
Stepwise procedures are sometimes said to eliminate the problem of multicollinearity, but this is misleading. The stepwise procedure uses an intelligent criterion to set order, but it certainly does not eliminate the problem of multicollinearity. To the extent that independents are highly intercorrelated, the standard errors of their standardized discriminant coefficients will be inflated and it will be difficult to assess the relative importance of the independent variables.
The researcher should keep in mind that the stepwise method capitalizes on chance associations and thus significance levels are worse (that is, numerically higher) than the true alpha significance rate reported. Thus a reported significance level of .05 may correspond to a true alpha rate of .10 or worse. For this reason, if stepwise discriminant analysis is employed, use of cross-validation is recommended. In the split halves method, the original dataset is split in two at random and one half is used to develop the discriminant equation and the other half is used to validate it.
One can also locate the group centroid for each group of the dependent in discriminant space in the same manner.
In the case of two discriminant functions, cases or group centroids may be plotted on a two-dimensional scatterplot of discriminant space (a canonical plot). Even when there are more than two functions, interpretation of the eigenvalues may reveal that only the first two functions are important and worthy of plotting.
Copyright 1998, 2008 by G. David Garson.
Last update, 3/24/2008.