This content is now available from Statistical Associates Publishers. Click here.

Below is the unformatted overview and table of contents.

Overview
Univariate GLM is the general linear model now often used to implement such long-established statistical procedures as regression and members of the anova family. It is "general" in the sense that one may implement both regression and anova models. One may also have fixed factors, random factors, and covariates as predictors. Also, in GLM one may have multiple dependent variables, as discussed in a separate section on multivariate GLM and one may have linear transformations and/or linear combinations of dependent variables. Moreover, one can apply multivariate tests of significance when modeling correlated dependent variables, not relying on individual univariate tests as in multiple regression. GLM also handles repeated measures designs. Finally, because GLM uses a generalized inverse of the matrix of independent variables' correlations with each other, it can handle redundant independents which would prevent solution in ordinary regression models. 
Data requirements. In all GLM models, the dependent(s) is/are continuous. The independents may be categorical factors (including both numeric and string types) or quantitative covariates. Data are assumed to come from a random sample for purposes of significance testing. The variance(s) of the dependent variable(s) is/are assumed to be the same for each cell formed by categories of the factor(s) (this is the homogeneity of variances assumption). 

Regression in GLM is simply a matter of entering the independent variables as covariates and, if there are sets of dummy variables (ex., Region, which would be translated into dummy variables in OLS regression, for ex., South = 1 or 0), the set variable (ex., Region) is entered as a fixed factor with no need for the researcher to create dummy variables manually. The b coefficients will be identical whether the regression model is run under ordinary regression (in SPSS, under Analyze, Regression, Linear) or under GLM (in SPSS, under Analyze, General Linear Model, Univariate). Where b coefficients are default output for regression in SPSS, in GLM the researcher must ask for "Parameter estimates" under the Options button. The R-square from the Regression procedure will equal the partial Eta squared from the GLM regression model. 

The advantages of doing regression via the GLM procedure are that dummy variables are coded automatically, it is easy to add interaction terms, and it computes eta-squared (identical to R-squared when relationships are linear, but greater if nonlinear relationships are present). However, the SPSS regression procedure would still be preferred if the reseacher wishes output of standardized regression (beta) coefficients, wishes to do multicollinearity diagnostics, or wishes to do stepwise regression or to enter independent variables hierarchically, in blocks. PROC GLM in SAS has a greater range of options and outputs (SAS also has PROC anova, but it handles only balanced designs/equal group sizes). 

Anova family

Although regression models may be run easily in GLM, as a practical matter univariate GLM is used primarily to run analysis of variance (anova) and analysis of covariance (ancova) models. Multivariate GLM is used primarily to run multiple analysis of variance (Manova) and multiple analysis of covariance (Mancova) models. Multivariate GLM is a separate module in SPSS. In SAS it is implemented within PROC GLM using the Manova statement. 

Analysis of variance (anova) is used to uncover the main and interaction effects of categorical independent variables (called "factors") on an interval dependent variable. A "main effect" is the direct effect of an independent variable on the dependent variable. An "interaction effect" is the joint effect of two or more independent variables on the dependent variable. Whereas regression models cannot handle interaction unless explicit crossproduct interaction terms are added, anova uncovers interaction effects on a built-in basis. For the case of multiple dependents, discussed separately, multivariate GLM implements multiple analysis of variance (Manova), including a variant which supports control variables as covariates (Mancova). 

The key statistic in anova is the F-test of difference of group means, testing if the means of the groups formed by values of the independent variable (or combinations of values for multiple independent variables) are different enough not to have occurred by chance. If the group means do not differ significantly then it is inferred that the independent variable(s) did not have an effect on the dependent variable. If the F test shows that overall the independent variable(s) is (are) related to the dependent variable, then multiple comparison tests of significance are used to explore just which values of the independent(s) have the most to do with the relationship. 

If the data involve repeated measures of the same variable, as in before-after or matched pairs tests, the F-test is computed differently from the usual between-groups design, but the inference logic is the same. There is also a large variety of other anova designs for special purposes, all with the same general logic. 

Note that analysis of variance tests the null hypotheses that group means do not differ. It is not a test of differences in variances, but rather assumes relative homogeneity of variances. Thus a key anova assumption is that the groups formed by the independent variable(s) have similar variances on the dependent variable ("homogeneity of variances"). Levene's test is standard for testing homogeneity of variances. Like regression, anova is a parametric procedure which assumes multivariate normality (the dependent has a normal distribution for each value category of the independent(s)). 

Analysis of covariance (ancova) is used to test the main and interaction effects of categorical variables on a continuous dependent variable, controlling for the effects of selected other continuous variables which covary with the dependent.The control variable is called the "covariate." There may be more than one covariate. One may also perform planned comparison or post hoc comparisons to see which values of a factor contribute most to the explanation of the dependent. ancova uses built-in regression using the covariates to predict the dependent, then does an anova on the residuals (the predicted minus the actual dependent variables) to see if the factors are still significantly related to the dependent variable after the variation due to the covariates has been removed. 
ancova is used for three purposes: 

"	In quasi-experimental (observational) designs, to remove the effects of variables which modify the relationship of the categorical independents to the interval dependent. 

"	In experimental designs, to control for factors which cannot be randomized but which can be measured on an interval scale. Since randomization in principle controls for all unmeasured variables, the addition of covariates to a model is rarely or never needed in experimental research. If a covariate is added and it is uncorrelated with the treatment (independent) variable, it is difficult to interpret as in principle it is controlling for something already controlled for by randomization. If the covariate is correlated with the treatment/independent, then its inclusion will lead the researcher to underestimate of the effect size of the treatment factors (independent variables). 

"	In regression models, to fit regressions where there are both categorical and interval independents. (This third purpose has become displaced by binary and multinomial logistic regression and other multivariate methods. On ancova regression models, see Wildt and Ahtola, 1978: 52-54). 

All three purposes have the goal of reducing the error term in the model. Like other control procedures, ancova can be seen as a form of "what if" analysis, asking what would happen if all cases scored equally on the covariates, so that the effect of the factors over and beyond the covariates can be isolated. ancova can be used in all anova designs and the same assumptions apply. 
GLM should be contrasted with more recent types of models, treated separately, including generalized linear models (GZLM, which incorporates nonlinear link functions of the dependent), linear mixed models (LMM, which handles multilevel data), and generalized linear mixed models (GLMM, which incorporates nonlinear link functions into LMM). Also available in SPSS is analysis of variance components (VC), a subset of LMM which performs many of the same functions as analysis of variance under GLM. A comparison of GLM with both LMM and VC, illustrated with data, is found in the section on linear mixed models. While both GLM and LMM accept the use of random effects in models, LMM is preferred when random effects are present for reasons given in the comparison.


Table of Contents
Overview	4
Key Concepts	8
Why testing means is related to variance in analysis of variance	8
One-way anova	9
Simple one-way anova in SPSS	9
Simple one-way anova in SAS	13
Two-way anova	16
Two-way anova in SPSS	17
Two-way anova in SAS	20
Multivariate or n-way anova	22
Regression models	22
Parameter estimates (b coefficients) for factor levels	24
Parameter estimates for dichotomies	25
Significance of parameter estimates	25
Research designs	25
Between-groups anova design	25
Completely randomized design	27
Full factorial anova	27
Balanced designs	28
Latin square designs	29
Graeco-Latin square designs	30
Randomized Complete Block Design (RCBD anova)	30
Split plot designs	32
Mixed design models	32
Random v. fixed effects models	34
In SPSS	34
In SAS	35
Linear mixed models (LMM) vs. general linear models (GLM)	36
Effects	36
Treating a random factor as a fixed factor	36
Mixed effects models	37
Nested designs	37
Nested designs	38
In SPSS	39
In SAS	42
Treatment by replication design	42
Within-groups (repeated measures) anova designs	42
Counterbalancing	43
Reliability procedure	44
Repeated measures GLM in SPSS	44
Repeated measures GLM in SAS	44
Interpreting repeated measures output	45
Variables	46
Types of variables	46
Dependent variable	46
Fixed and random factors	47
Covariates	47
WLS weights	47
Models and types of effects	48
Full factorial models	48
Effects	49
Main effects	49
Interaction effects	49
Residual effects	52
Effect size measures	53
Effect size coefficients based on percent of variance explained	53
Partial eta-squared	53
Omega-squared	54
Herzberg's R2	55
Intraclass correlation	55
Effect size coefficients based on standardized mean differences	55
Cohen's d	55
Glass's delta	57
Hedge's g	58
Significance tests	58
F-test	58
Reading the F value	58
Example 1	59
Example 2	59
Significance in two-way anova	60
Computation of F	60
F-test assumptions	60
Adjusted means	61
Lack of fit test	61
Power level and noncentrality parameter	62
Hotelling's T-Square	63
Planned multiple comparison t-tests	63
Simple t-test difference of means	65
Bonferroni-adjusted t-test	65
Sidak test	67
Dunnett's test	67
HSU's multiple comparison with the best (MCB) test	67
Post-hoc multiple comparison tests	67
The q-statistic	68
Output formats: pairwise vs. multiple range	69
Tests assuming equal variances	69
Least significant difference (LSD) test	69
The Fisher-Hayter test	70
Tukey's test, a.k.a. Tukey honestly significant difference (HSD) test	71
Tukey-b test, a.k.a. Tukey's wholly significant difference (WSD) test	72
S-N-K or Student-Newman-Keuls test	73
Duncan test	74
Ryan test (REGWQ)	74
The Shaffer-Ryan test	76
The Scheffé test	76
Hochberg GT2 test	78
Gabriel test	80
Waller-Duncan test	80
Tests not assuming equal variances	80
Tamhane's T2 test	80
Games-Howell test	81
Dunnett's T3 test and Dunnett's C test	82
The Tukey-Kramer test	82
The Miller-Winer test	82
More than one multiple comparison/post hoc test	82
Example	82
Contrast tests	84
Overview	84
Types of contrasts	85
Deviation contrasts	85
Simple contrasts	85
Difference contrasts	85
Helmert contrasts	85
Repeated contrasts	85
Polynomial contrasts	86
Custom hypothesis tables	86
Custom hypothesis tables index table	86
Custom hypothesis tables	87
Estimated marginal means	89
Overview	89
EMM Estimates table	91
Other EMM output	94
EMM Pairwise comparisons table	94
EMM Univariate tests table	94
Profile plots	94
GLM Repeated Measures	95
Overview	95
Key Terms and Concepts	96
Within-subjects factor	96
Repeated measures dependent variables	97
Between-subjects factors	98
Covariates	98
Models	99
Type of sum of squares	100
Balanced vs. unbalanced models	100
Estimated marginal means	101
Pairwise comparisons	102
Statistics options in SPSS	103
Descriptive statistics	103
Hypothesis SSCP matrices	104
Partial eta-squared	104
Within-subjects SSCP matrix and within-subjects contrast effects.	105
Multivariate tests.	106
Univariate vs. multivariate models	107
Box's M test	108
Mauchly's test of sphericity	108
Univariate tests of within-subjects effects	109
Parameter estimates	111
Levene's test	112
Spread-versus-level plots	113
Residual plots	113
Lack of fit test	115
General estimable function	115
Post hoc tests	115
Overview	115
Profile plots for repeated measures GLM	118
Example	118
Contrast analysis for repeated measures GLM	120
Types of contrasts for repeated measures	121
Simple contrasts example	122
Saving variables in repeated measures GLM	123
Cook's distance	124
Leverage values	124
Assumptions	125
Interval data	125
Homogeneity of variances	125
Homogeneity of variance	126
Appropriate sums of squares	130
Multivariate normality	131
Adequate sample size	132
Equal or similar sample sizes	132
Random sampling	132
Orthogonal error	133
Data independence	133
Recursive models	133
Categorical independent variables	133
The independent variable is or variables are categorical.	133
Continuous dependent variables	133
Non-significant outliers	133
Sphericity	134
Assumptions related to ancova:	135
Limited number of covariates	135
Low measurement error of the covariate	135
Covariates are linearly related or in a known relationship to the dependent	135
Homogeneity of covariate regression coefficients	136
No covariate outliers	136
No high multicollinearity of the covariates	137
Additivity	137
Assumptions for repeated measures	137
Frequently Asked Questions	138
How do you interpret an anova table?	139
Isn't anova just for experimental research designs?	141
Should I standardize my data before using anova or ancova?	141
Since orthogonality (uncorrelated independents) is an assumption, and since this is rare in real life topics of interest to social scientists, shouldn't regression models be used instead of anova models?	141
Couldn't I just use several t-tests to compare means instead of anova?	141
How does counterbalancing work in repeated measures designs?	142
How is F computed in random effect designs?	143
What designs are available in anova for correlated independents?	143
If the assumption of homogeneity of variances is not met, should regression models be used instead?	144
Is anova a linear procedure like regression? How is linearity related to the "Contrasts" option?	144
What is hierarchical anova or ancova?	144
Is there a limit on the number of independents which can be included in an analysis of variance?	145
Which SPSS procedures compute anova?	145
I have several independent variables, which means there are a very large number of possible interaction effects. Does SPSS have to compute them all?	145
Do you use the same designs (between groups, repeated measures, etc.) with ancova as you do with anova?	145
How is GLM ancova different from traditional ancova?	146
What are paired comparisons (planned or post hoc) in ancova?	146
Can ancova be modeled using regression?	146
How does blocking with anova compare to ancova?	146
What is the SPSS syntax for GLM repeated measures?	147
What is a "doubly repeated measures design"?	148
Bibliography	149