|
|
Overview
Sampling in conjunction with survey research is not only one of the most popular approaches to data collection in the social sciences, random sampling is also the foundation assumption for much of inferential statistics and significance testing. Most national and other large samples by governmental agencies and polling firms use complex sampling designs, such as multistage or stratified sampling. Thes compex designs mean that significance tests must be computed differently. That is, most significance tests in statistical software are somewhat inaccurate when applied to complex samples. Specialized software, discussed in the section on complex samples, should be used to obtain the highest level of accuracy. See also the section on Survey Research.
|
|
Significance testing is not appropriate for non-random samples or for enumerations/censuses. We would like to make similar inferences for non-random samples, but that is impossible. Any relationship, not matter how small, is a true relationship (barring measurement error) for an enumeration.
Simple random sampling is common when the sampling frame is small. Starting at an arbitrary point in a table of random numbers, such as found in most statistics books, sets of random numbers are read and associated with members of the sampling frame. The first three digits might be 712, for instance, and thus the 712th person in the sampling frame might be selected. If the sampling frame had over 999 people, then four-digit random sequences would be used. Selections outside the range of members of the frame would be ignored. Using pseudo random number generating software (used to construct tables of random numbers found in books), such as that built into SPSS, a simpler, more modern method is simply to request such software generate n random digits between 1 and N, where n is sample size and N is population size.
Warnings on Use of Cluster Sampled Data
Clustering will produce correlated observations, which violates the assumption of independently sampled cases - an assumption of many statistical techniques. Multi-level modeling is an example of a technique which is appropriate for clustered samples (see Goldstein, 1995). Nonetheless, it should be noted that it is common practice to treat data from cluster sampling as if it were randomly sampled data.
Overall, multi-stage or cluster sampling is usually less precise than simple random sampling, which in turn is less precise than one-stage stratified sampling. Warning: Since multistage sampling is the most prevalent form for large, national surveys, and since most computer programs use standard error algorithms based on the assumption of simple random samples, the standard errors reported in the literature often underestimate sampling error, leading to too many Type II errors (false positives). See the discussion below regarding estimation and software for complex samples.
| If no. of taxpayers in residence is | ||||
| 1 | 2 | 3 | Survey form | % of all forms | Interview taxpayer # |
| A | 1/3 | 1 | 1 | 1 |
| B | 1/6 | 1 | 1 | 2 |
| C | 1/6 | 1 | 2 | 2 |
| D | 1/3 | 1 | 2 | 3 |
In this example, there are four different forms of the survey (A, B, C, D), printed and randomly distributed in the proportions in the table above. Each survey form would have one of the rows in the table. For instance, in form B, if there were two taxpayers in the residence then the interviewer would interview taxpayer number 1. For forms C or D, taxpayer number 2 would be interviewed. For form A, taxpayer number 1 would be interviewed. Kalton (1983: 61) presents the table for the assumption of a maximum of four selectable subjects per residence. Similar tables can be constructed for any assumption. All such selection grid tables equalize the probability of any appropriate individual being chosen for inclusion in the sample.
A similar, simpler approach is the "last birthday" method, whereby the researcher asks the number of adults in the household, then interviews the sole adult in one-adult households; every other time interviews the adult with the most recent birthday in two-adult households, and every other time the other adult; every third time in three-adult households, etc. Asking about birthdays rather than ages may be less sensitive. Some evidence suggests that in telephone surveys, these methods may require additional time screening and interview time, and may generate too many callbacks or refusals. In face-to-face interviews there is greater subject tolerance for such screening questions.
Bourque and Clark (1992: 60) state, "It has been our experience that the use of weights does not substantially change estimates of the sample mean unless nonrespondents are appreciably different from respondents and there is a substantial proportion of nonrespondents."
Combinations of these factors create complexities which, in combination with lack of knowledge about the population to be sampled, usually make sample size estimation just that -- an arbitrary estimate. Note that needed sample size does not depend at all on the size of the population to be sampled. Even in the most complex analyses, samples over 1,500 are very rarely needed. Specialized software, such as the “SamplePower” module which can be added to base SPSS, exists to help the researcher calculate needed sample size. See http://www.spss.com/samplepower/. Other such software as well as online sample size calculators may be found on the web. There are also rules-of-thumb and various manual methods.
"Because some authors (ex., Oakes, 1986) note the use of inferential statistics is warranted for nonprobability samples if the sample seems to represent the population, and in deference to the widespread social science practice of reporting significance levels for nonprobability samples as a convenient if arbitrary assessment criterion, significance levels have been reported in the tables included in this article." See Michael Oakes (1986). Statistical inference: A commentary for social and behavioral sciences. NY: Wiley.
Another rough rule of thumb is based on chi-square testing needs, may be followed:
However, as mentioned, these are simple rule-of-thumb methods and more satisfactory estimates require taking more complex factors into account.
SPSS distributes an add-on module called Complex Samples for purposes of significance testing with multistage and other complex samples. In turn, it has the following sub-modules:
The replication method is implemented by WesVar Complex Samples software, distributed by SPSS, Inc. Note Wesvar is limited in the statistical procedures for which it provides adjusted significance tests. Other software handling estimation and variance estimateion under stratified and other unequal probability sampling methods include SUDAAN (from the Research Triangle Institute) and VPLX (from the Bureau of the Census).
Lee, Forthofer, and Lorimar also discuss three other methods of variance estimation: (1) balanced repeated replication, used primarily in paired selection designs; (2) repeated replication through jackknifing, which is based on pseudo-replication involving serially leaving cases out in successive subsamples ; and (3) the Taylor series method, which is less computationally intensive but cannot handle certain types of estimates, such as estimates of medians or percentiles.
It should be noted that in practice, most social science researchers utilize the default significance tests generated by SPSS and other leading statistical packages, whose defaults are based on the assumption of simple random sampling. This is justified on the argument that in most cases, conclusions are not affected. However, ignoring complex sample design and using simple random sampling methods runs the risk of biased estimates (Landis et al., 1982; Kish 1992; Korn and Graubard, 1995).
Resampling is an alternative inductive approach to significance testing, now becoming more popular in part because of the complexity and difficulty of applying traditional significance tests to complex samples.
Copyright 1998, 2008, 2009 by G. David Garson
Last update 1/14/2009