Cox Regression

Throughout, italics indicate commentary by the author, not part of SPSS output.

Below, the Notes table shows we are using one of the sample data files which comes with SPSS: C:\Program Files\SPSS\Mouse survival.sav. The variables icrf and hxm are covariates. Neither is time-dependent. The variable "time" is the time variable. The variable "status" is the status variable, with the event set to equal 1, which is customary For illustration purposes, all options are requested. The variable icrf is declared as categorical and it has five cutting points and six classes.

Notes
Output Created 06-NOV-2006 16:53:53
Comments
Input Data C:\Program Files\SPSS\Mouse survival.sav
Active Dataset DataSet1
Filter <none>
Weight <none>
Split File <none>
N of Rows in Working Data File 214
Missing Value Handling Definition of Missing User-defined missing values are treated as missing.
Syntax COXREG
time /STATUS=status(1)
/CONTRAST (icrf)=Indicator
/METHOD=ENTER icrf hxm
/PLOT SURVIVAL HAZARD LML OMS
/SAVE=SURVIVAL SE LML HAZARD PRESID DFBETA XBETA
/PRINT=CI(95) CORR SUMMARY
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) .
Resources Elapsed Time 0:00:01.64
Variables Created or Modified SUR_1 Survival function evaluate at the current case
SE_1 Standard error of the survival function
LML_1 Log-minus-log-of-survival function
HAZ_1 Cumulative hazard function evaluate at the current case
PR1_1 Partial residual for icrf(1)
PR2_1 Partial residual for icrf(2)
PR3_1 Partial residual for icrf(3)
PR4_1 Partial residual for icrf(4)
PR5_1 Partial residual for icrf(5)
PR6_1 Partial residual for hxm
DFB1_1 Dfbeta for icrf(1)
DFB2_1 Dfbeta for icrf(2)
DFB3_1 Dfbeta for icrf(3)
DFB4_1 Dfbeta for icrf(4)
DFB5_1 Dfbeta for icrf(5)
DFB6_1 Dfbeta for hxm
XBE_1 X'Beta


[DataSet1] C:\Program Files\SPSS\Mouse survival.sav

Case Processing Summary


N Percent
Cases available in analysis Event(a) 211 98.6%
Censored 3 1.4%
Total 214 100.0%
Cases dropped Cases with missing values 0 .0%
Cases with negative time 0 .0%
Censored cases before the earliest event in a stratum 0 .0%
Total 0 .0%
Total 214 100.0%
a Dependent Variable: time

Above, there are 214 cases, but only the 211 uncensored ones are used to compute the covariate regression coefficients. All cases are used to compute the baseline hazard function.

Categorical Variable Codings(b)


Frequency (1) (2) (3) (4) (5)
icrf(a) .00 48 1 0 0 0 0
50.00 39 0 1 0 0 0
75.00 40 0 0 1 0 0
112.50 40 0 0 0 1 0
169.00 39 0 0 0 0 1
253.00 8 0 0 0 0 0
a Indicator Parameter Coding
b Category variable: icrf


Block 0: Beginning Block

Omnibus Tests of Model Coefficients
-2 Log Likelihood
1901.334

In Block 0 above, the baseline -2 log-likelihood is displayed. This is the log likelihood for the time-only null model in which all covariate regression coefficients are 0. This baseline is used in testing the significance of later models below.


Block 1: Method = Enter

Above, the Enter method forces all covariates into the model in a single block (as opposed to the Stepwise method).

Omnibus Tests of Model Coefficients(a,b)
-2 Log Likelihood Overall (score) Change From Previous Step Change From Previous Block
Chi-square df Sig. Chi-square df Sig. Chi-square df Sig.
1816.566 110.186 6 .000 84.767 6 .000 84.767 6 .000
a Beginning Block Number 0, initial Log Likelihood function: -2 Log likelihood: 1901.334
b Beginning Block Number 1. Method = Enter

Above, this table shows a likelihood ratio (-2 log likelihood test) overall, for change from the previous step, and change from the previous block (if block entry is used; otherwise this will be the same as for previous step). In this case it is change from the null model in Block 0. If the overall significance is .05 or less at any step, as it is here, then at least one of the covariates in the model is significant.

Variables in the Equation

B SE Wald df Sig. Exp(B) 95.0% CI for Exp(B)
Lower Upper
icrf

94.124 5 .000


icrf(1) 2.490 .424 34.448 1 .000 12.058 5.250 27.691
icrf(2) 1.003 .398 6.349 1 .012 2.727 1.250 5.949
icrf(3) .784 .405 3.744 1 .053 2.191 .990 4.848
icrf(4) .593 .406 2.135 1 .144 1.809 .817 4.004
icrf(5) .414 .401 1.069 1 .301 1.513 .690 3.320
hxm -.014 .003 16.868 1 .000 .986 .980 .993


Above, B" is the unstandardized regression coefficient, which cannot be used in its unstandardized form as an effect size measure since units of measurement and variances differ among variables. SPSS also displays the standard error of B (SE), its Wald test significance value, the degrees of freedom (df), and the significance value of the coefficient, all interpreted as in logistic regression. If Sig.> .05 (the usual social science standard), then the covariate effect cannot be assumed to be different from zero. That is, if sig(Wald) < .05, then the researcher concludes the variable is useful to the model. Positive regression coefficients mean the covariate increases hazard (an increased probability that status=1, for ex.).

Exp(B) is the predicted change in the hazard for each unit increase in the covariate.Because Exp(B) is close to 1.0 for hxm, we can conclude that hxm has only a very small effect on Status. Because that value, .986, is less than 1.0, the direction of the effect is toward reducing the hazard rate. The greatest effect is for the first category of icrft. Because the value 1.0 appears within the confidence intervals of all but the first two categories of the categorical variable icrft, only the first two can be considered to have an effect on Status.

Above, icrft is categorical, so there is an overall row as well as a row for each categorical value except the omitted reference category (here the last, which is the default). The overall Wald value tests the null hypothesis that all the effect coefficients for that categorical variable are zero. If the overall Wald significance is .05 or less, the researcher may conclude that at least one of the effect coefficients differs from zero which it does here. Here the first two categories of icrft are significant.

Correlation Matrix of Regression Coefficients

icrf(1) icrf(2) icrf(3) icrf(4) icrf(5)
icrf(2) .817



icrf(3) .837 .835


icrf(4) .838 .830 .846

icrf(5) .821 .825 .836 .837
hxm -.365 -.189 -.290 -.292 -.223


Above, we do not need to worry about multicollinearity between hxm and icrft.

Covariate Means

Mean
icrf(1) .224
icrf(2) .182
icrf(3) .187
icrf(4) .187
icrf(5) .182
hxm 31.346


Above, baseline hazard, survival, and cumulative hazard rates are presented for a hypothetical person who scores at the mean on the covariate(s). This table helps understand those means.

Survival Function


This is the cumulative survival plot, with survival time on the X axis and cumulative survival on the Y axis. The curves represent a hypothetical individual with mean values on the covariates at any given time as represented on the X axis. The curve shows how cumulative survival decreases over time for such hypothetical individuals.

One Minus Survival Function


This is a plot of one-minus the survival function on a linear scale.

Hazard Function


This is the cumulative hazard plot, with survival time on the X axis and cumulative hazard on the Y axis. The curve represents a hypothetical individual with mean values on the covariates at any given time as represented on the X axis. The curve shows how cumulative hazard increases over time for such hypothetical individuals. I

LML Function


This is the Log-Minus-Log survival plot of the cumulative survival estimate after the ln(-ln) transformation is applied to the estimate.


LML for patterns 1 - 6

Above, for categorical variables, the Log-Minus-Log survival plot is used to determine if baseline hazard functions are proportional across groups of a categorical variable. The X axis is survival time. The Y axis is log minus log. If the survival plots for the groups of the categorical variable are parallel, as they are here, then the baseline survival functions are parallel and the researcher rejects the need to conduct stratified Cox regression.

Below: Though not part of the output, the Save button in the Cox regression dialog also can cause any or all of the following numeric variables to be appended to the end of the datasheet, making them available for further analysis. For instance, the DfBeta variables are used to spot highly influential cases which as outliers may represent coding errors, sampling errors, or cases needing a different explanatory model.

SUR_1	Survival function evaluate at the current case
SE_1	Standard error of the survival function
LML_1	Log-minus-log-of-survival function
HAZ_1	Cumulative hazard function evaluate at the current case
PR1_1	Partial residual for icrf(1)
PR2_1	Partial residual for icrf(2)
PR3_1	Partial residual for icrf(3)
PR4_1	Partial residual for icrf(4)
PR5_1	Partial residual for icrf(5)
PR6_1	Partial residual for hxm
DFB1_1	Dfbeta for icrf(1)
DFB2_1	Dfbeta for icrf(2)
DFB3_1	Dfbeta for icrf(3)
DFB4_1	Dfbeta for icrf(4)
DFB5_1	Dfbeta for icrf(5)
DFB6_1	Dfbeta for hxm
XBE_1	X'Beta