This content is now available from Statistical Associates Publishers. Click here.

Below is the unformatted overview and table of contents.

Overview

Binary logistic regression is a form of regression which is used when the dependent is a dichotomy and the independents are of any type. Multinomial logistic regression exists to handle the case of dependents with more classes than two, though it is sometimes used for binary dependents also since it generates somewhat different output described below. When multiple classes of a multinomial dependent variable can be ranked, then ordinal logistic regression is preferred to multinomial logistic regression since ordinal regression has higher power for ordinal data. Note that continuous variables are not used as dependents in logistic regression. Unlike logit regression, there can be only one dependent variable. 

More recently, generalized linear modeling (GZLM) has appeared as a module in SPSS, SAS, and other packages. GZLM provides allows the researcher to create regression models with any distribution of the dependent (ex., binary, multinomial, ordinal) and any link function (ex., log for loglinear analysis, logit for binary or multinomial logistic analysis, cumulative logit for ordinal logistic analysis). Similarly, generalized linear mixed modeling (GLMM) is now available to handle multilevel logistic modeling. 

Logistic regression can be used to predict a categorical dependent variable on the basis of continuous and/or categorical independents; to determine the effect size of the independent variables on the dependent; to rank the relative importance of independents; to assess interaction effects; and to understand the impact of covariate control variables. The impact of predictor variables is usually explained in terms of odds ratios. 

Logistic regression applies maximum likelihood estimation after transforming the dependent into a logit variable. A logit is the natural log of the odds of the dependent equaling a certain value or not (usually 1 in binary logistic models, or the highest value in multinomial models). Logistic regression estimates the odds of a certain event (value) occurring. This means that logistic regression calculates changes in the log odds of the dependent, not changes in the dependent itself as OLS regression does. 

Logistic regression has many analogies to OLS regression: logit coefficients correspond to b coefficients in the logistic regression equation, the standardized logit coefficients correspond to beta weights, and a pseudo R2 statistic is available to summarize the strength of the relationship. Unlike OLS regression, however, logistic regression does not assume linearity of relationship between the raw values of the independent variables and raw values of the dependent, does not require normally distributed variables, does not assume homoscedasticity, and in general has less stringent requirements. It does, however, require that observations be independent and that the independent variables be linearly related to the logit of the dependent. The predictive success of the logistic regression can be assessed by looking at the classification table, showing correct and incorrect classifications of the dichotomous, ordinal, or polytomous dependent. Goodness-of-fit tests such as the likelihood ratio test are available as indicators of model appropriateness, as is the Wald statistic to test the significance of individual independent variables. 

Logit regression, discussed separately, is another related option in SPSS and other statistics packages for using loglinear methods to analyze one or more dependents. Where both are applicable, logit regression has numerically equivalent results to logistic regression, but with different output options. For the same class of problems, logistic regression has become more popular among social scientists. 


Table of Contents

Overview	9
Key Terms and Concepts	11
Binary, binomial, and multinomial logistic regression	11
The logistic model	12
The logistic equation	13
The dependent variable	15
Factors	19
Covariates and Interaction Terms	23
Estimation	24
A basic binary logistic regression model in SPSS	25
Example	25
Omnibus tests of model coefficients	27
Model summary	28
Classification table	28
Variables in the equation table	31
Optional output	32
Classification plot	32
Hosmer and Lemeshow test of goodness of fit	33
Casewise listing of residuals for outliers > 2 standard deviations	36
A basic binary logistic regression model in SAS	37
SAS syntax	37
Reconciling SAS and SPSS output	38
Statistical Output in SAS	39
Global null hypothesis tests	39
Model fit statistics	40
The classification table	41
The association of predicted probabilities and observed responses table	41
Analysis of parameter estimates	43
Odds ratio estimates	44
Hosmer and Lemeshow test of goodness of fit	44
Regression diagnostics table	45
A basic multinomial logistic regression model in SPSS	47
Example	47
Model	48
Default statistical output	49
Pseudo R-square	50
Step summary	50
Model fitting information table	50
Goodness of fit tests	51
Likelihood ratio tests	51
Parameter estimates	52
Optional statistical output for multinomial regression in SPSS	54
Classification table	54
Observed and expected frequencies	54
Asymptotic correlation matrix	54
A basic multinomial logistic regression model in SAS	55
Example	55
SAS syntax	55
Statistical output for multinomial regression in SAS	56
Maximum likelihood anova table	56
Maximum likelihood estimates table	56
Parameter Estimates and Odds Ratios	59
Parameter estimates and odds ratios in binary logistic regression	59
Example	59
A second binary example	63
Parameter estimates and odds ratios in multinomial logistic regression	65
Example	65
A second example	68
Logistic coefficients and correlation	70
Reporting odds ratios	70
Odds ratios: summary	72
Effect size	72
Confidence interval on the odds ratio	72
Warning: very high or low odds ratios	73
Comparing the change in odds for different values of X	73
Comparing the change in odds when interaction terms are in the model	73
Probabilities, logits, and odds ratios	74
Probabilities	74
Relative risk ratios (RRR)	77
Logistic coefficients and logits	77
Parameter estimate for the intercept	78
Logits	78
Significance Tests	81
Significance tests for binary logistic regression	81
Omnibus tests of model coefficients	81
Hosmer and Lemeshow test of goodness of fit	81
Fit tests in stepwise or block-entry logistic regression	81
Wald tests for variables in the model	82
Significance tests for multinomial logistic regression	83
Likelihood ratio test of the model	83
Wald tests of parameters	83
Goodness of fit tests	83
Likelihood ratio tests	84
Testing individual model parameters	86
Goodness of Fit Index (obsolete)	88
Effect Size Measures	89
Effect size for the model	89
Pseudo R-squared	89
Classification tables	91
The c statistic	98
Information theory measures of model fit	99
Effect size for parameters	101
Odds ratios	101
Standardized logistic coefficients	101
Stepwise logistic regression	101
Overview	101
Forward selection vs. backward elimination	103
Cross-validation	104
Rao's efficient score as a variable entry criterion for forward selection	104
Which step is the best model?	106
Contrast Analysis	107
Repeated contrasts	107
Indicator contrasts	107
Contrasts and ordinality	108
Analysis of residuals	110
Overview	110
Residual analysis in binary logistic regression	110
Outliers	110
The dbeta statistic	110
The leverage statistic	111
Cook's distance	111
Residual analysis in multinomial logistic regression	111
Conditional logistic regression for matched pairs data	112
Overview	112
Data setup	112
SPSS dialogs	112
Output	113
Assumptions	115
Data level	115
Meaningful coding	116
Proper specification of the model	116
Independence of irrelevant alternatives	117
Error terms are assumed to be independent (independent sampling)	117
Low error in the explanatory variables	117
Linearity	117
Additivity	119
Absence of perfect separation	119
Absence of perfect multicollinearity	119
Absence of high multicollinearity	120
Centered variables	120
No outliers	120
Sample size	121
Sampling adequacy	121
Expected dispersion	122
Frequently Asked Questions	123
How should logistic regression results be reported?	123
Why not just use regression with dichotomous dependents?	123
When is OLS regression preferred over logistic regression?	124
When is discriminant analysis preferred over logistic regression?	124
What is the SPSS syntax for logistic regression?	124
Can I create interaction terms in my logistic model, as with OLS regression?	127
Will SPSS's logistic regression procedure handle my categorical variables automatically?	127
Can I handle missing cases the same in logistic regression as in OLS regression?	128
Explain the error message I am gettting in SPSS about cells with zero frequencies.	128
Is it true for logistic regression, as it is for OLS regression, that the beta weight (standardized logit coefficient) for a given independent reflects its explanatory power controlling for other variables in the equation, and that the betas will change if variables are added or dropped from the equation?	128
What is the coefficient in logistic regression which corresponds to R-Square in multiple regression?	129
Is there a logistic regression analogy to adjusted R-square in OLS regression?	129
Is multicollinearity a problem for logistic regression the way it is for multiple linear regression?	129
What is the logistic equivalent to the VIF test for multicollinearity in OLS regression? Can odds ratios be used?	129
How can one use estimated variance of residuals to test for model misspecification?	130
How are interaction effects handled in logistic regression?	131
Does stepwise logistic regression exist, as it does for OLS regression?	131
What are the stepwise options in multinomial logistic regression in SPSS?	132
What if I use the multinomial logistic option when my dependent is binary?	135
What is nonparametric logistic regression and how is it more nonlinear?	135
How many independent variables can I have?	136
How do I express the logistic regression equation if one or more of my independent variables is categorical?	137
How do I compare logit coefficients across groups formed by a categorical independent variable?	137
How do I compute the confidence interval for the unstandardized logit (effect) coefficients?	138
What is the STATA approach to multinomial logistic regression?	138
Bibliography	139