|
|
OverviewBootstrapping, also known as resampling or Monte Carlo estimation, uses brute computer power to estabish confidence intervals for any test statistic, based not on assumptions such as multivariate normal distribution, but rather based on repeated samples from the researcher's own data. As such it is a nonparametric method of statistical inference. That is, rather than use generic distribution tables (ex., normal distribution tables) to compute approximate p probability values, resampling generates a unique sampling distribution based on the actual data at hand and uses experimental rather than analytic methods. Unlike approximation with generic distribution tables, resampling yields unbiased estimates because it is based on unbiased samples of all possible outcomes in the data being studied. However, there is an increased danger of overfitting to noise in the data, a problem which may be addressed by combining bootstrapping methods with cross-validation. |
|
Example
MODEL PROGRAM A=.5 B=1.6.
COMPUTE PSTOP=A*SPEED**B.
**** Generate nonlinear parameter estimates with standard errors, confidenc intervals and parameter correlations
**** Save the estimates in a file called PARAM
CNLR STOP /BOOTSTRAP /OUTFILE=PARAM.
**** GET retrieves the saved PARAM system file
GET FILE=PARAM.
**** LIST lists the sample estimates
LIST.
COMPUTE ID=$CASENUM.
SELECT IF (ID > 1).
**** FREQUENCIES generates histograms for the bootstrapped parameter estimates
FREQUENCIES A B /FORMAT=NOTABLE /HISTOGRAM.
***oms_bootstrapping.sps***.
***if c:\temp is not a valid drive\path, replace all instances of c:\temp
with a valid drive\path.
PRESERVE.
SET TVARS NAMES.
*first OMS command just suppresses Viewer output.
OMS /DESTINATION VIEWER=NO /TAG='suppressall'.
*select regression coefficients tables and write to data file.
OMS /SELECT TABLES
/IF COMMANDS=['Regression'] SUBTYPES=['Coefficients']
/DESTINATION FORMAT=SAV OUTFILE='c:\temp\temp.sav'
/COLUMNS DIMNAMES=[ 'Variables' 'Statistics']
/TAG='reg_coeff'.
*define a macro to draw samples with replacement and
run Regression commands.
DEFINE regression_bootstrap (samples=!TOKENS(1)
/depvar=!TOKENS(1)
/indvars=!CMDEND)
COMPUTE dummyvar=1.
AGGREGATE
/OUTFILE = * MODE = ADDVARIABLES
/BREAK=dummyvar
/filesize=N.
!DO !other=1 !TO !samples
SET SEED RANDOM.
WEIGHT OFF.
FILTER OFF.
DO IF $casenum=1.
- COMPUTE #samplesize=filesize.
- COMPUTE #filesize=filesize.
END IF.
DO IF (#samplesize>0 and #filesize>0).
- COMPUTE sampleWeight=rv.binom(#samplesize, 1/#filesize).
- COMPUTE #samplesize=#samplesize-sampleWeight.
- COMPUTE #filesize=#filesize-1.
ELSE.
- COMPUTE sampleWeight=0.
END IF.
WEIGHT BY sampleWeight.
FILTER BY sampleWeight.
REGRESSION
/STATISTICS COEFF
/DEPENDENT !depvar
/METHOD=ENTER !indvars.
!DOEND
!ENDDEFINE.
g***insert any valid path\data file name***.
GET FILE='c:\Program Files\SPSS13\Employee data.sav'.
***Call the macro, and specify number of samples,
dependent variable, and independent variables.
regression_bootstrap
samples=100
depvar=salary
indvars=salbegin jobtime .
OMSEND.
GET FILE 'c:\temp\temp.sav'.
FREQUENCIES
VARIABLES=salbegin_B salbegin_Beta jobtime_B jobtime_Beta
/FORMAT NOTABLE
/PERCENTILES= 2.5 97.5
/HISTOGRAM NORMAL.
RESTORE.
Copyright 1998, 2008 by G. David Garson.
Last updated 6/21/08.