
This page applies to the format of the exams in either PA 765 or PA 766. I have not yet written the final exam for this semester and do not guarantee that it will be in the same format as the past. However, in the past the final has included one or more of three types of questions: (1) closed book short answer questions; (2) analysis of output, either provided or generated hands-on by you; and (3) analysis of scenarios.
Short Answer Items. Tests sometimes include 10 - 25 short answer questions. Here are some examples of short answer items used in the past, with answers:
The 2000 General Social Survey is located at: Y:\PC\DATASETS\GSS\gss2000\GSS2000nonmiss.sav. The cumulative codebook for the GSS, including 2000, is located at: http://www.icpsr.umich.edu/GSS/. There are a number of variables which measure "confidence in government." These are: CONFED CONJUDGE CONLEGIS CONARMY. The GSS has any number of other variables which may be used as independents.
Your assignment is to answer as best you can the question, "What causes confidence in government?" Even though it may be a bit redundant to utilize two statistical techniques rather than one, you are to utilize two techniques of your choice (but ones covered in a weekly topic of this course). To get an "A" on the exam, one of these must be structural equation modeling. Write a quantitative essay answering this question. Naturally, explain your methodological reasoning fully, step by step (you may want to take notes on your word processor as you execute SPSS commands). Include all computer output as an appendix.
Hints:
2. You will have to deal with Don't Know and other off-scale responses. Code 0, 8 ,and 9 as missing in the Variable View tab of the SPSS data editor. You may wish to do Transform, Replace missing cases, Linear interpolation to impute data.
3. If you are creating latent variables, you will want to recall:
Discussion: An assumption of discriminant analysis is that the two dependent categories not be extremely unequal in size, which is the case here. When coding dummy variables, it is better to pick a middle or other meaningful category as the omitted reference group, not just automatically leave out the last group. Wilks' lambda tests the first discriminant function. In spite of the reference to "functions" there can only be one when the dependent has only two values. When Box's M is below .05, variances of that variable differ significantly between groups of the dependent, violating an assumption of discriminant analysis. The standardized discriminant coefficients may be used to assess the predictive power of the interval independents, but not the dummies, which must be evaluated as a group using hierarchical discriminant analysis (seeing the difference in the squared canonical correlation with and without the entire set of dummies). It would have been helpful to print out the classification table to see how well the discriminant function was classifying cases. In this and all subsequent scenarios, it would have been nicer to be comparing two models rather than seeking to validate one. In this and all subsequent scenarios, the validity of the conclusions assumes the model being tested is properly specified, neither including irrelevant variables nor omitting important variables.
Discussion: Cronbach's alpha >= .7 is the proper test, though it does not rule out the possibility of a three-item scale, for instance. That is, it is not equivalent to demonstrating the four items are four separate dimensions of productivity. Nonetheless MANOVA is a suitable procedure for multiple interval dependents, and 7-point Likert scales are accepted by most researchers for this purpose. Wilks' Lambda F test is the standard one when there are more than two groups in the dependent, as here. If as reported above, the F test supports the researcher's conclusions, but there are complications. First, the interaction of merit pay with praise may be significant even if merit pay itself is not, and this is not discussed. Second, significance in the multivariate F test does not show there is an effect on each of the four productivity variables individually. Univariate ANOVA's are also part of the MANOVA output and would throw light on this, but these are not mentioned in the scenario. Also, there is the fact that information is lost when an interval variable like merit pay increment is reduced to a categorical variable. Collapsing data in this way has unpredictable effects, commonly creating a bias toward lower correlation than actually exists. Instead the researcher could have left merit pay increment as interval and used it as a covariate in MANCOVA, whose multivariate F tests test covariate as well as main and interaction effects. Main and interaction effects are not even reported in the scenario.
Discussion: This is generally a correct procedure, but there are some important qualifications. The number of cases is below the normal level considered appropriate for factor analysis. The assessment of the relative importance of the two factors is of little import if the two factors do not explain much of the variance in bond ratings, and R-square is not reported. Likewise, the standard error of the beta coefficients is not reported, so one has no information on the stability of the coefficients. The factor loadings of the indicators on the factors are not presented in the scenario, so it is ambiguous whether there is a simple factor structure, or if the factors are more difficult to interpret than the "prosperity" and "fiscal" labels seem to indicate. Also keep in mind, particularly if R-square is not high, that there may be other independent variables and their entry into the regression equation will change the beta weights, so the researcher's conclusions are only correct if one assumes the model is properly and sufficiently specified with only the two factors as independents. Note one cannot omit the regression and simply compare eigenvalues in factor analysis: that comparison shows which factor explains most of the variance in all the variables (the dependent, bond ratings, is not included in the factor analysis), whereas the betas for the factor scores indicate variance explained in the dependent.
Discussion: This is basically correct, although again with some qualifications. The scenario does not report if there is more than one canonical correlation; the economic and educational variables may be related on more than the first dimension reported. There is no discussion of redundancy analysis or reporting of structure correlations so one does not know how well the canonical economic and educational variates predict the original indicators. Note that although the number of cases, 62, is well below the normally prescribed minimum (20 times the number of variables, 17), this is not an issue here as this is an enumeration, not a sample. Also, expressing the result as "shared variance" rather than the usual "percent of variance explained" is acceptable, though this is shared with respect to the latent variables. The structure correlations would need to be reported to understand the extent to which all 12 economic and 5 educational variables were related to the latent variables, and one might want to do redundancy analysis for similar reasons.
Discussion: Some criticisms may be made regarding the randomness of the sampling design, but the design is within the realm of common practice. The response rate is not given and there is no mention of any analysis of non-response. Logistic regression is appropriate. Some of the logits may not be significant and the Log-Likelihood Ratio and/or the Wald statistic (with corresponding probability level) should have been used to test this, possibly leading to dropping some of the independents. If irrelevant variables are not dropped, the standardized logit coefficients will not be interpreted properly. Likewise, one is assuming all relevant variables are in the model; improper specification will also lead to incorrect interpretation of the standardized logits. A goodness-of-fit test (ex., model chi-square) should have been used to show that the overall model was significant. One should report not only the relative importance of the independents, but also how well the logistic regression predicted the dependent. This is done using the classification table, the c statistic, and/or Nagelkerke's R-Square (or a similar measure). Additional quibbles: one could have added interaction terms, one could test for linearity, multicollinearity could be discussed, and it might be that the DMV could have provided a more appropriate sampling frame.
Discussion: