LOGISTIC REGRESSION

Overview

Binary logistic regression is a form of regression which is used when the dependent is a dichotomy and the independents are of any type. Multinomial logistic regression exists to handle the case of dependents with more classes than two, though it is sometimes used for binary dependents also since it generates somewhat different output described below. When multiple classes of a multinomial dependent variable can be ranked, then ordinal logistic regression is preferred to multinomial logistic regression since ordinal regression has higher power for ordinal data. Note that continuous variables are not used as dependents in logistic regression. Unlike logit regression, there can be only one dependent variable.

More recently, generalized linear modeling (GZLM) has appeared as a module in SPSS, SAS, and other packages. GZLM provides allows the researcher to create regression models with any distribution of the dependent (ex., binary, multinomial, ordinal) and any link function (ex., log for loglinear analysis, logit for binary or multinomial logistic analysis, cumulative logit for ordinal logistic analysis). Similarly, generalized linear mixed modeling (GLMM) is now available to handle multilevel logistic modeling.

Logistic regression can be used to predict a categorical dependent variable on the basis of continuous and/or categorical independents; to determine the effect size of the independent variables on the dependent; to rank the relative importance of independents; to assess interaction effects; and to understand the impact of covariate control variables. The impact of predictor variables is usually explained in terms of odds ratios.

Logistic regression applies maximum likelihood estimation after transforming the dependent into a logit variable. A logit is the natural log of the odds of the dependent equaling a certain value or not (usually 1 in binary logistic models, or the highest value in multinomial models). Logistic regression estimates the odds of a certain event (value) occurring. This means that logistic regression calculates changes in the log odds of the dependent, not changes in the dependent itself as OLS regression does.

Logistic regression has many analogies to OLS regression: logit coefficients correspond to b coefficients in the logistic regression equation, the standardized logit coefficients correspond to beta weights, and a pseudo R2 statistic is available to summarize the strength of the relationship. Unlike OLS regression, however, logistic regression does not assume linearity of relationship between the raw values of the independent variables and raw values of the dependent, does not require normally distributed variables, does not assume homoscedasticity, and in general has less stringent requirements. It does, however, require that observations be independent and that the independent variables be linearly related to the logit of the dependent. The predictive success of the logistic regression can be assessed by looking at the classification table, showing correct and incorrect classifications of the dichotomous, ordinal, or polytomous dependent. Goodness-of-fit tests such as the likelihood ratio test are available as indicators of model appropriateness, as is the Wald statistic to test the significance of individual independent variables.

Logit regression, discussed separately, is another related option in SPSS and other statistics packages for using loglinear methods to analyze one or more dependents. Where both are applicable, logit regression has numerically equivalent results to logistic regression, but with different output options. For the same class of problems, logistic regression has become more popular among social scientists.

The full content is now available from Statistical Associates Publishers. Click here.

Below is the unformatted table of contents.

LOGISTIC REGRESSION Table of Contents Overview 9 Key Terms and Concepts 11 Binary, binomial, and multinomial logistic regression 11 The logistic model 12 The logistic equation 13 The dependent variable 15 Factors 19 Covariates and Interaction Terms 23 Estimation 24 A basic binary logistic regression model in SPSS 25 Example 25 Omnibus tests of model coefficients 27 Model summary 28 Classification table 28 Variables in the equation table 31 Optional output 32 Classification plot 32 Hosmer and Lemeshow test of goodness of fit 33 Casewise listing of residuals for outliers > 2 standard deviations 36 A basic binary logistic regression model in SAS 37 SAS syntax 37 Reconciling SAS and SPSS output 38 Statistical Output in SAS 39 Global null hypothesis tests 39 Model fit statistics 40 The classification table 41 The association of predicted probabilities and observed responses table 41 Analysis of parameter estimates 43 Odds ratio estimates 44 Hosmer and Lemeshow test of goodness of fit 44 Regression diagnostics table 45 A basic multinomial logistic regression model in SPSS 47 Example 47 Model 48 Default statistical output 49 Pseudo R-square 50 Step summary 50 Model fitting information table 50 Goodness of fit tests 51 Likelihood ratio tests 51 Parameter estimates 52 Optional statistical output for multinomial regression in SPSS 54 Classification table 54 Observed and expected frequencies 54 Asymptotic correlation matrix 54 A basic multinomial logistic regression model in SAS 55 Example 55 SAS syntax 55 Statistical output for multinomial regression in SAS 56 Maximum likelihood anova table 56 Maximum likelihood estimates table 56 Parameter Estimates and Odds Ratios 59 Parameter estimates and odds ratios in binary logistic regression 59 Example 59 A second binary example 63 Parameter estimates and odds ratios in multinomial logistic regression 65 Example 65 A second example 68 Logistic coefficients and correlation 70 Reporting odds ratios 70 Odds ratios: summary 72 Effect size 72 Confidence interval on the odds ratio 72 Warning: very high or low odds ratios 73 Comparing the change in odds for different values of X 73 Comparing the change in odds when interaction terms are in the model 73 Probabilities, logits, and odds ratios 74 Probabilities 74 Relative risk ratios (RRR) 77 Logistic coefficients and logits 77 Parameter estimate for the intercept 78 Logits 78 Significance Tests 81 Significance tests for binary logistic regression 81 Omnibus tests of model coefficients 81 Hosmer and Lemeshow test of goodness of fit 81 Fit tests in stepwise or block-entry logistic regression 81 Wald tests for variables in the model 82 Significance tests for multinomial logistic regression 83 Likelihood ratio test of the model 83 Wald tests of parameters 83 Goodness of fit tests 83 Likelihood ratio tests 84 Testing individual model parameters 86 Goodness of Fit Index (obsolete) 88 Effect Size Measures 89 Effect size for the model 89 Pseudo R-squared 89 Classification tables 91 The c statistic 98 Information theory measures of model fit 99 Effect size for parameters 101 Odds ratios 101 Standardized logistic coefficients 101 Stepwise logistic regression 101 Overview 101 Forward selection vs. backward elimination 103 Cross-validation 104 Rao's efficient score as a variable entry criterion for forward selection 104 Which step is the best model? 106 Contrast Analysis 107 Repeated contrasts 107 Indicator contrasts 107 Contrasts and ordinality 108 Analysis of residuals 110 Overview 110 Residual analysis in binary logistic regression 110 Outliers 110 The dbeta statistic 110 The leverage statistic 111 Cook's distance 111 Residual analysis in multinomial logistic regression 111 Conditional logistic regression for matched pairs data 112 Overview 112 Data setup 112 SPSS dialogs 112 Output 113 Assumptions 115 Data level 115 Meaningful coding 116 Proper specification of the model 116 Independence of irrelevant alternatives 117 Error terms are assumed to be independent (independent sampling) 117 Low error in the explanatory variables 117 Linearity 117 Additivity 119 Absence of perfect separation 119 Absence of perfect multicollinearity 119 Absence of high multicollinearity 120 Centered variables 120 No outliers 120 Sample size 121 Sampling adequacy 121 Expected dispersion 122 Frequently Asked Questions 123 How should logistic regression results be reported? 123 Why not just use regression with dichotomous dependents? 123 When is OLS regression preferred over logistic regression? 124 When is discriminant analysis preferred over logistic regression? 124 What is the SPSS syntax for logistic regression? 124 Can I create interaction terms in my logistic model, as with OLS regression? 127 Will SPSS's logistic regression procedure handle my categorical variables automatically? 127 Can I handle missing cases the same in logistic regression as in OLS regression? 128 Explain the error message I am gettting in SPSS about cells with zero frequencies. 128 Is it true for logistic regression, as it is for OLS regression, that the beta weight (standardized logit coefficient) for a given independent reflects its explanatory power controlling for other variables in the equation, and that the betas will change if variables are added or dropped from the equation? 128 What is the coefficient in logistic regression which corresponds to R-Square in multiple regression? 129 Is there a logistic regression analogy to adjusted R-square in OLS regression? 129 Is multicollinearity a problem for logistic regression the way it is for multiple linear regression? 129 What is the logistic equivalent to the VIF test for multicollinearity in OLS regression? Can odds ratios be used? 129 How can one use estimated variance of residuals to test for model misspecification? 130 How are interaction effects handled in logistic regression? 131 Does stepwise logistic regression exist, as it does for OLS regression? 131 What are the stepwise options in multinomial logistic regression in SPSS? 132 What if I use the multinomial logistic option when my dependent is binary? 135 What is nonparametric logistic regression and how is it more nonlinear? 135 How many independent variables can I have? 136 How do I express the logistic regression equation if one or more of my independent variables is categorical? 137 How do I compare logit coefficients across groups formed by a categorical independent variable? 137 How do I compute the confidence interval for the unstandardized logit (effect) coefficients? 138 What is the STATA approach to multinomial logistic regression? 138 Bibliography 139