This content is now available from Statistical Associates Publishers. Click here.
Below is the unformatted overview and table of contents.
Overview
Univariate GLM is the general linear model now often used to implement such long-established statistical procedures as regression and members of the anova family. It is "general" in the sense that one may implement both regression and anova models. One may also have fixed factors, random factors, and covariates as predictors. Also, in GLM one may have multiple dependent variables, as discussed in a separate section on multivariate GLM and one may have linear transformations and/or linear combinations of dependent variables. Moreover, one can apply multivariate tests of significance when modeling correlated dependent variables, not relying on individual univariate tests as in multiple regression. GLM also handles repeated measures designs. Finally, because GLM uses a generalized inverse of the matrix of independent variables' correlations with each other, it can handle redundant independents which would prevent solution in ordinary regression models.
Data requirements. In all GLM models, the dependent(s) is/are continuous. The independents may be categorical factors (including both numeric and string types) or quantitative covariates. Data are assumed to come from a random sample for purposes of significance testing. The variance(s) of the dependent variable(s) is/are assumed to be the same for each cell formed by categories of the factor(s) (this is the homogeneity of variances assumption).
Regression in GLM is simply a matter of entering the independent variables as covariates and, if there are sets of dummy variables (ex., Region, which would be translated into dummy variables in OLS regression, for ex., South = 1 or 0), the set variable (ex., Region) is entered as a fixed factor with no need for the researcher to create dummy variables manually. The b coefficients will be identical whether the regression model is run under ordinary regression (in SPSS, under Analyze, Regression, Linear) or under GLM (in SPSS, under Analyze, General Linear Model, Univariate). Where b coefficients are default output for regression in SPSS, in GLM the researcher must ask for "Parameter estimates" under the Options button. The R-square from the Regression procedure will equal the partial Eta squared from the GLM regression model.
The advantages of doing regression via the GLM procedure are that dummy variables are coded automatically, it is easy to add interaction terms, and it computes eta-squared (identical to R-squared when relationships are linear, but greater if nonlinear relationships are present). However, the SPSS regression procedure would still be preferred if the reseacher wishes output of standardized regression (beta) coefficients, wishes to do multicollinearity diagnostics, or wishes to do stepwise regression or to enter independent variables hierarchically, in blocks. PROC GLM in SAS has a greater range of options and outputs (SAS also has PROC anova, but it handles only balanced designs/equal group sizes).
Anova family
Although regression models may be run easily in GLM, as a practical matter univariate GLM is used primarily to run analysis of variance (anova) and analysis of covariance (ancova) models. Multivariate GLM is used primarily to run multiple analysis of variance (Manova) and multiple analysis of covariance (Mancova) models. Multivariate GLM is a separate module in SPSS. In SAS it is implemented within PROC GLM using the Manova statement.
Analysis of variance (anova) is used to uncover the main and interaction effects of categorical independent variables (called "factors") on an interval dependent variable. A "main effect" is the direct effect of an independent variable on the dependent variable. An "interaction effect" is the joint effect of two or more independent variables on the dependent variable. Whereas regression models cannot handle interaction unless explicit crossproduct interaction terms are added, anova uncovers interaction effects on a built-in basis. For the case of multiple dependents, discussed separately, multivariate GLM implements multiple analysis of variance (Manova), including a variant which supports control variables as covariates (Mancova).
The key statistic in anova is the F-test of difference of group means, testing if the means of the groups formed by values of the independent variable (or combinations of values for multiple independent variables) are different enough not to have occurred by chance. If the group means do not differ significantly then it is inferred that the independent variable(s) did not have an effect on the dependent variable. If the F test shows that overall the independent variable(s) is (are) related to the dependent variable, then multiple comparison tests of significance are used to explore just which values of the independent(s) have the most to do with the relationship.
If the data involve repeated measures of the same variable, as in before-after or matched pairs tests, the F-test is computed differently from the usual between-groups design, but the inference logic is the same. There is also a large variety of other anova designs for special purposes, all with the same general logic.
Note that analysis of variance tests the null hypotheses that group means do not differ. It is not a test of differences in variances, but rather assumes relative homogeneity of variances. Thus a key anova assumption is that the groups formed by the independent variable(s) have similar variances on the dependent variable ("homogeneity of variances"). Levene's test is standard for testing homogeneity of variances. Like regression, anova is a parametric procedure which assumes multivariate normality (the dependent has a normal distribution for each value category of the independent(s)).
Analysis of covariance (ancova) is used to test the main and interaction effects of categorical variables on a continuous dependent variable, controlling for the effects of selected other continuous variables which covary with the dependent.The control variable is called the "covariate." There may be more than one covariate. One may also perform planned comparison or post hoc comparisons to see which values of a factor contribute most to the explanation of the dependent. ancova uses built-in regression using the covariates to predict the dependent, then does an anova on the residuals (the predicted minus the actual dependent variables) to see if the factors are still significantly related to the dependent variable after the variation due to the covariates has been removed.
ancova is used for three purposes:
" In quasi-experimental (observational) designs, to remove the effects of variables which modify the relationship of the categorical independents to the interval dependent.
" In experimental designs, to control for factors which cannot be randomized but which can be measured on an interval scale. Since randomization in principle controls for all unmeasured variables, the addition of covariates to a model is rarely or never needed in experimental research. If a covariate is added and it is uncorrelated with the treatment (independent) variable, it is difficult to interpret as in principle it is controlling for something already controlled for by randomization. If the covariate is correlated with the treatment/independent, then its inclusion will lead the researcher to underestimate of the effect size of the treatment factors (independent variables).
" In regression models, to fit regressions where there are both categorical and interval independents. (This third purpose has become displaced by binary and multinomial logistic regression and other multivariate methods. On ancova regression models, see Wildt and Ahtola, 1978: 52-54).
All three purposes have the goal of reducing the error term in the model. Like other control procedures, ancova can be seen as a form of "what if" analysis, asking what would happen if all cases scored equally on the covariates, so that the effect of the factors over and beyond the covariates can be isolated. ancova can be used in all anova designs and the same assumptions apply.
GLM should be contrasted with more recent types of models, treated separately, including generalized linear models (GZLM, which incorporates nonlinear link functions of the dependent), linear mixed models (LMM, which handles multilevel data), and generalized linear mixed models (GLMM, which incorporates nonlinear link functions into LMM). Also available in SPSS is analysis of variance components (VC), a subset of LMM which performs many of the same functions as analysis of variance under GLM. A comparison of GLM with both LMM and VC, illustrated with data, is found in the section on linear mixed models. While both GLM and LMM accept the use of random effects in models, LMM is preferred when random effects are present for reasons given in the comparison.
Table of Contents
Overview 4
Key Concepts 8
Why testing means is related to variance in analysis of variance 8
One-way anova 9
Simple one-way anova in SPSS 9
Simple one-way anova in SAS 13
Two-way anova 16
Two-way anova in SPSS 17
Two-way anova in SAS 20
Multivariate or n-way anova 22
Regression models 22
Parameter estimates (b coefficients) for factor levels 24
Parameter estimates for dichotomies 25
Significance of parameter estimates 25
Research designs 25
Between-groups anova design 25
Completely randomized design 27
Full factorial anova 27
Balanced designs 28
Latin square designs 29
Graeco-Latin square designs 30
Randomized Complete Block Design (RCBD anova) 30
Split plot designs 32
Mixed design models 32
Random v. fixed effects models 34
In SPSS 34
In SAS 35
Linear mixed models (LMM) vs. general linear models (GLM) 36
Effects 36
Treating a random factor as a fixed factor 36
Mixed effects models 37
Nested designs 37
Nested designs 38
In SPSS 39
In SAS 42
Treatment by replication design 42
Within-groups (repeated measures) anova designs 42
Counterbalancing 43
Reliability procedure 44
Repeated measures GLM in SPSS 44
Repeated measures GLM in SAS 44
Interpreting repeated measures output 45
Variables 46
Types of variables 46
Dependent variable 46
Fixed and random factors 47
Covariates 47
WLS weights 47
Models and types of effects 48
Full factorial models 48
Effects 49
Main effects 49
Interaction effects 49
Residual effects 52
Effect size measures 53
Effect size coefficients based on percent of variance explained 53
Partial eta-squared 53
Omega-squared 54
Herzberg's R2 55
Intraclass correlation 55
Effect size coefficients based on standardized mean differences 55
Cohen's d 55
Glass's delta 57
Hedge's g 58
Significance tests 58
F-test 58
Reading the F value 58
Example 1 59
Example 2 59
Significance in two-way anova 60
Computation of F 60
F-test assumptions 60
Adjusted means 61
Lack of fit test 61
Power level and noncentrality parameter 62
Hotelling's T-Square 63
Planned multiple comparison t-tests 63
Simple t-test difference of means 65
Bonferroni-adjusted t-test 65
Sidak test 67
Dunnett's test 67
HSU's multiple comparison with the best (MCB) test 67
Post-hoc multiple comparison tests 67
The q-statistic 68
Output formats: pairwise vs. multiple range 69
Tests assuming equal variances 69
Least significant difference (LSD) test 69
The Fisher-Hayter test 70
Tukey's test, a.k.a. Tukey honestly significant difference (HSD) test 71
Tukey-b test, a.k.a. Tukey's wholly significant difference (WSD) test 72
S-N-K or Student-Newman-Keuls test 73
Duncan test 74
Ryan test (REGWQ) 74
The Shaffer-Ryan test 76
The Scheffé test 76
Hochberg GT2 test 78
Gabriel test 80
Waller-Duncan test 80
Tests not assuming equal variances 80
Tamhane's T2 test 80
Games-Howell test 81
Dunnett's T3 test and Dunnett's C test 82
The Tukey-Kramer test 82
The Miller-Winer test 82
More than one multiple comparison/post hoc test 82
Example 82
Contrast tests 84
Overview 84
Types of contrasts 85
Deviation contrasts 85
Simple contrasts 85
Difference contrasts 85
Helmert contrasts 85
Repeated contrasts 85
Polynomial contrasts 86
Custom hypothesis tables 86
Custom hypothesis tables index table 86
Custom hypothesis tables 87
Estimated marginal means 89
Overview 89
EMM Estimates table 91
Other EMM output 94
EMM Pairwise comparisons table 94
EMM Univariate tests table 94
Profile plots 94
GLM Repeated Measures 95
Overview 95
Key Terms and Concepts 96
Within-subjects factor 96
Repeated measures dependent variables 97
Between-subjects factors 98
Covariates 98
Models 99
Type of sum of squares 100
Balanced vs. unbalanced models 100
Estimated marginal means 101
Pairwise comparisons 102
Statistics options in SPSS 103
Descriptive statistics 103
Hypothesis SSCP matrices 104
Partial eta-squared 104
Within-subjects SSCP matrix and within-subjects contrast effects. 105
Multivariate tests. 106
Univariate vs. multivariate models 107
Box's M test 108
Mauchly's test of sphericity 108
Univariate tests of within-subjects effects 109
Parameter estimates 111
Levene's test 112
Spread-versus-level plots 113
Residual plots 113
Lack of fit test 115
General estimable function 115
Post hoc tests 115
Overview 115
Profile plots for repeated measures GLM 118
Example 118
Contrast analysis for repeated measures GLM 120
Types of contrasts for repeated measures 121
Simple contrasts example 122
Saving variables in repeated measures GLM 123
Cook's distance 124
Leverage values 124
Assumptions 125
Interval data 125
Homogeneity of variances 125
Homogeneity of variance 126
Appropriate sums of squares 130
Multivariate normality 131
Adequate sample size 132
Equal or similar sample sizes 132
Random sampling 132
Orthogonal error 133
Data independence 133
Recursive models 133
Categorical independent variables 133
The independent variable is or variables are categorical. 133
Continuous dependent variables 133
Non-significant outliers 133
Sphericity 134
Assumptions related to ancova: 135
Limited number of covariates 135
Low measurement error of the covariate 135
Covariates are linearly related or in a known relationship to the dependent 135
Homogeneity of covariate regression coefficients 136
No covariate outliers 136
No high multicollinearity of the covariates 137
Additivity 137
Assumptions for repeated measures 137
Frequently Asked Questions 138
How do you interpret an anova table? 139
Isn't anova just for experimental research designs? 141
Should I standardize my data before using anova or ancova? 141
Since orthogonality (uncorrelated independents) is an assumption, and since this is rare in real life topics of interest to social scientists, shouldn't regression models be used instead of anova models? 141
Couldn't I just use several t-tests to compare means instead of anova? 141
How does counterbalancing work in repeated measures designs? 142
How is F computed in random effect designs? 143
What designs are available in anova for correlated independents? 143
If the assumption of homogeneity of variances is not met, should regression models be used instead? 144
Is anova a linear procedure like regression? How is linearity related to the "Contrasts" option? 144
What is hierarchical anova or ancova? 144
Is there a limit on the number of independents which can be included in an analysis of variance? 145
Which SPSS procedures compute anova? 145
I have several independent variables, which means there are a very large number of possible interaction effects. Does SPSS have to compute them all? 145
Do you use the same designs (between groups, repeated measures, etc.) with ancova as you do with anova? 145
How is GLM ancova different from traditional ancova? 146
What are paired comparisons (planned or post hoc) in ancova? 146
Can ancova be modeled using regression? 146
How does blocking with anova compare to ancova? 146
What is the SPSS syntax for GLM repeated measures? 147
What is a "doubly repeated measures design"? 148
Bibliography 149