|
|
Overview
Note that maximum likelihood data imputation can also be implemented in AMOS, the structural equation program supported by SPSS.
|
|
For purposes of univariate analysis (ex., understanding the frequency distribution of how subjects respond to an opinion item) imputation can reduce bias and often is used for this purpose if data are missing at random. There are two forms of randomly missing data, MCAR and MAR:
If data are MCAR, then the researcher may choose listwise or pairwise deletion of cases. If data are not MCAR, missing values should be imputed. See "Estimation methods" below.
It should be noted that imputation has not been common in social science since statistical objections can be raised about any of the methods which might be used. An example of imputation for MAR data, used by the Bureau of the Census for instance, is the hot deck method, which simply means that the value of the case with the most recent non-missing value from the same subsample as the current case is used to replace the missing value in the current case. Stata and some other statistical packages implement the hotdeck method, under which the user specifies which variables form the strata (these variables should correlate with the item for which values are to be estimated), then multiple samples are taken from each stratum to derive the estimate of the missing value for the given item for cases in that stratum. Maximum likelihood estimation (MLE) is also used for MAR imputation. However, since the 1980s maximum likelihood and multiple imputation methods of imputation have been developed which may change this generalization.
The MVAR module in SPSS displays three types of 'indicator variable statistics' under its 'Descriptives' options, where the indicator variable is a dichotomous variable SPSS creates to flag whether or not the value of a given variable is present or missing.
Pairwise deletion omits cases which do not have data on a variable used in the current calculation only. This means that different calculations (ex., different correlation coefficients) will utilize different cases and will have different sample sizes (different n's). This effect is undesirable (and in some procedures like structural equation modeling, may prevent a solution altogether), but pairwise deletion may be necessary when overall sample size is small or the number of cases with missing data is large. Even then, misinterpretation may well result unless missing data are missing completely at random (MCAR).
Selecting "listwise" in SPSS displays the means, correlation matrix, and covariance matrix, omitting cases that have missing values in any variable under consideration (listwise deletion). Selecting "pairwise" in SPSS displays for each pair of quantitative variables the number of pairwise nonmissing values, and the pairwise mean, variance, covariance, and correlation.
In SPSS the user can specify the data distribution assumptions to be used by the EM algorithm: normal, mixed normal, and Student's t. For a mixed normal assumption, the user can specify the proportion and the standard deviation ratio. The user can also set the maximum number of iterations attempted in the iterative MLE process used by the EM algorithm (not recommended as the procedure may then stop even if estimates have not yet converged).
Solas (Statistical Solutions, 1998) is imputation software based on the ABB algorithm describred above, distributed by Statistical Solutions Ltd.