LOGISTIC REGRESSION

Overview

Binary logistic regression is a form of regression which is used when the dependent is a dichotomy and the independents are of any type. Multinomial logistic regression exists to handle the case of dependents with more classes than two, though it is sometimes used for binary dependents also since it generates somewhat different output described below. When multiple classes of a multinomial dependent variable can be ranked, then ordinal logistic regression is preferred to multinomial logistic regression since ordinal regression has higher power for ordinal data. Note that continuous variables are not used as dependents in logistic regression. Unlike logit regression, there can be only one dependent variable.

More recently, generalized linear modeling (GZLM) has appeared as a module in SPSS, SAS, and other packages. GZLM provides allows the researcher to create regression models with any distribution of the dependent (ex., binary, multinomial, ordinal) and any link function (ex., log for loglinear analysis, logit for binary or multinomial logistic analysis, cumulative logit for ordinal logistic analysis). Similarly, generalized linear mixed modeling (GLMM) is now available to handle multilevel logistic modeling.

Logistic regression can be used to predict a categorical dependent variable on the basis of continuous and/or categorical independents; to determine the effect size of the independent variables on the dependent; to rank the relative importance of independents; to assess interaction effects; and to understand the impact of covariate control variables. The impact of predictor variables is usually explained in terms of odds ratios.

Logistic regression applies maximum likelihood estimation after transforming the dependent into a logit variable. A logit is the natural log of the odds of the dependent equaling a certain value or not (usually 1 in binary logistic models, or the highest value in multinomial models). Logistic regression estimates the odds of a certain event (value) occurring. This means that logistic regression calculates changes in the log odds of the dependent, not changes in the dependent itself as OLS regression does.

Logistic regression has many analogies to OLS regression: logit coefficients correspond to b coefficients in the logistic regression equation, the standardized logit coefficients correspond to beta weights, and a pseudo R2 statistic is available to summarize the strength of the relationship. Unlike OLS regression, however, logistic regression does not assume linearity of relationship between the raw values of the independent variables and raw values of the dependent, does not require normally distributed variables, does not assume homoscedasticity, and in general has less stringent requirements. It does, however, require that observations be independent and that the independent variables be linearly related to the logit of the dependent. The predictive success of the logistic regression can be assessed by looking at the classification table, showing correct and incorrect classifications of the dichotomous, ordinal, or polytomous dependent. Goodness-of-fit tests such as the likelihood ratio test are available as indicators of model appropriateness, as is the Wald statistic to test the significance of individual independent variables.

Logit regression, discussed separately, is another related option in SPSS and other statistics packages for using loglinear methods to analyze one or more dependents. Where both are applicable, logit regression has numerically equivalent results to logistic regression, but with different output options. For the same class of problems, logistic regression has become more popular among social scientists.

The full content is now available from Statistical Associates Publishers. Click here.

```LOGISTIC REGRESSION

Overview 9
Key Terms and Concepts 11
Binary, binomial, and multinomial logistic regression 11
The logistic model 12
The logistic equation 13
The dependent variable 15
Factors 19
Covariates and Interaction Terms 23
Estimation 24
A basic binary logistic regression model in SPSS 25
Example 25
Omnibus tests of model coefficients 27
Model summary 28
Classification table 28
Variables in the equation table 31
Optional output 32
Classification plot 32
Hosmer and Lemeshow test of goodness of fit 33
Casewise listing of residuals for outliers > 2 standard deviations 36
A basic binary logistic regression model in SAS 37
SAS syntax 37
Reconciling SAS and SPSS output 38
Statistical Output in SAS 39
Global null hypothesis tests 39
Model fit statistics 40
The classification table 41
The association of predicted probabilities and observed responses table 41
Analysis of parameter estimates 43
Odds ratio estimates 44
Hosmer and Lemeshow test of goodness of fit 44
Regression diagnostics table 45
A basic multinomial logistic regression model in SPSS 47
Example 47
Model 48
Default statistical output 49
Pseudo R-square 50
Step summary 50
Model fitting information table 50
Goodness of fit tests 51
Likelihood ratio tests 51
Parameter estimates 52
Optional statistical output for multinomial regression in SPSS 54
Classification table 54
Observed and expected frequencies 54
Asymptotic correlation matrix 54
A basic multinomial logistic regression model in SAS 55
Example 55
SAS syntax 55
Statistical output for multinomial regression in SAS 56
Maximum likelihood anova table 56
Maximum likelihood estimates table 56
Parameter Estimates and Odds Ratios 59
Parameter estimates and odds ratios in binary logistic regression 59
Example 59
A second binary example 63
Parameter estimates and odds ratios in multinomial logistic regression 65
Example 65
A second example 68
Logistic coefficients and correlation 70
Reporting odds ratios 70
Odds ratios: summary 72
Effect size 72
Confidence interval on the odds ratio 72
Warning: very high or low odds ratios 73
Comparing the change in odds for different values of X 73
Comparing the change in odds when interaction terms are in the model 73
Probabilities, logits, and odds ratios 74
Probabilities 74
Relative risk ratios (RRR) 77
Logistic coefficients and logits 77
Parameter estimate for the intercept 78
Logits 78
Significance Tests 81
Significance tests for binary logistic regression 81
Omnibus tests of model coefficients 81
Hosmer and Lemeshow test of goodness of fit 81
Fit tests in stepwise or block-entry logistic regression 81
Wald tests for variables in the model 82
Significance tests for multinomial logistic regression 83
Likelihood ratio test of the model 83
Wald tests of parameters 83
Goodness of fit tests 83
Likelihood ratio tests 84
Testing individual model parameters 86
Goodness of Fit Index (obsolete) 88
Effect Size Measures 89
Effect size for the model 89
Pseudo R-squared 89
Classification tables 91
The c statistic 98
Information theory measures of model fit 99
Effect size for parameters 101
Odds ratios 101
Standardized logistic coefficients 101
Stepwise logistic regression 101
Overview 101
Forward selection vs. backward elimination 103
Cross-validation 104
Rao's efficient score as a variable entry criterion for forward selection 104
Which step is the best model? 106
Contrast Analysis 107
Repeated contrasts 107
Indicator contrasts 107
Contrasts and ordinality 108
Analysis of residuals 110
Overview 110
Residual analysis in binary logistic regression 110
Outliers 110
The dbeta statistic 110
The leverage statistic 111
Cook's distance 111
Residual analysis in multinomial logistic regression 111
Conditional logistic regression for matched pairs data 112
Overview 112
Data setup 112
SPSS dialogs 112
Output 113
Assumptions 115
Data level 115
Meaningful coding 116
Proper specification of the model 116
Independence of irrelevant alternatives 117
Error terms are assumed to be independent (independent sampling) 117
Low error in the explanatory variables 117
Linearity 117
Absence of perfect separation 119
Absence of perfect multicollinearity 119
Absence of high multicollinearity 120
Centered variables 120
No outliers 120
Sample size 121
Expected dispersion 122
Frequently Asked Questions 123
How should logistic regression results be reported? 123
Why not just use regression with dichotomous dependents? 123
When is OLS regression preferred over logistic regression? 124
When is discriminant analysis preferred over logistic regression? 124
What is the SPSS syntax for logistic regression? 124
Can I create interaction terms in my logistic model, as with OLS regression? 127
Will SPSS's logistic regression procedure handle my categorical variables automatically? 127
Can I handle missing cases the same in logistic regression as in OLS regression? 128
Explain the error message I am gettting in SPSS about cells with zero frequencies. 128
Is it true for logistic regression, as it is for OLS regression, that the beta weight (standardized logit coefficient) for a given independent reflects its explanatory power controlling for other variables in the equation, and that the betas will change if variables are added or dropped from the equation? 128
What is the coefficient in logistic regression which corresponds to R-Square in multiple regression? 129
Is there a logistic regression analogy to adjusted R-square in OLS regression? 129
Is multicollinearity a problem for logistic regression the way it is for multiple linear regression? 129
What is the logistic equivalent to the VIF test for multicollinearity in OLS regression? Can odds ratios be used? 129
How can one use estimated variance of residuals to test for model misspecification? 130
How are interaction effects handled in logistic regression? 131
Does stepwise logistic regression exist, as it does for OLS regression? 131
What are the stepwise options in multinomial logistic regression in SPSS? 132
What if I use the multinomial logistic option when my dependent is binary? 135
What is nonparametric logistic regression and how is it more nonlinear? 135
How many independent variables can I have? 136
How do I express the logistic regression equation if one or more of my independent variables is categorical? 137
How do I compare logit coefficients across groups formed by a categorical independent variable? 137
How do I compute the confidence interval for the unstandardized logit (effect) coefficients? 138
What is the STATA approach to multinomial logistic regression? 138
Bibliography 139

```