[Home]  [Syllabus]  [Statnotes]  [Links]  [Lab]  [Instructor]  [Home]

Validity





    Overview

      A study is valid if its measures actually measure what they claim to, and if there are no logical errors in drawing conclusions from the data. There are a great many labels for different types of validity, but they all have to do with threats and biases which would undermine the meaningfulness of research. Be less concerned about defining and differentiating the types of validity (researchers disagree on the definitions and types, and yes, they do overlap) and be more concerned about all the types of questions one should ask about the validity of research (researchers agree on the importance of the questions).




Contents



Key Concepts

  1. Historical background: Some early writers simply equated validity with establishing that a construct's scale correlated with a dependent variable in the intended manner and, indeed, a scale might be considered valid as a measure of anything with which it correlated (Guilford 1946). Types of validity were codified in 1954 by the American Psychological Association, which identified four categories: content validity, construct validity, concurrent validity, and predictive validity (APA, 1954). Each type corresponded to a different research purpose: content validity had to do with subject-matter content testing, construct validity with measuring abstract concepts like IQ, concurrent validity with devising new scales or tests to replace existing ones, and predictive validity with devising indicators of future performance. A 1966 update to the APA typology combined the last two types under the label criterion-related validity (APA, 1966). Later, Sheperd (1993) was among those who argued that both criterion and content validity were subtypes of construct validity, leaving only one type of validity. This unified view of validity supported the notion that only rarely could a researcher establish validity with reference to a single earlier type. Moreover, Cronbach's (1971: 447) earlier argument that validity could not be established for a test or scale, only for interpretations researchers might make from a test or scale, also became widely accepted in the current era. Some, such as Messick (1989), accept construct validity as the only type, but argue for multiple standards for assessing it: relevant content, based on sound theory or rationale, internally consistent items, external correlation with related measures, generalizability across populations and time, and explicit in its social consequences (ex., racial bias). In a nutshell, over the last half century the concept of validation has evolved from establishing correlation with a dependent variable to the idea that researchers must validate each interpretation of each scale, test, or instrument measuring a construct and do so in multiple ways which only taken together form the whole of what validity is.

    The outline below largely accepts the unified view of validity, centering on construct validity, but adds to it separate coverage in three areas: (1) content validity, focusing on the labeling of constructs; (2) internal validity, focusing on research design bias; and (3) statistical validity, focusing on meeting assumptions of empirical procedures. While all three might be (and by some are) considered subtypes of construct validity, they do not fall neatly in its two major subdomains, convergent and discriminant validity, and so in the discussion below have been treated separately.

  2. Construct validity, sometimes also called factorial validity, has to do with the logic of items which comprise measures of social concepts. A good construct has a theoretical basis which is translated through clear operational definitions involving measurable indicators. A poor construct may be characterized by lack of theoretical agreement on its content, or by flawed operationalization such that its indicators may be construed as measuring one thing by one researcher and another thing by another researcher. A construct is a way of defining something, and to the extent that a researcher's proposed construct is at odds with the existing literature on related hypothesized relationships using other measures, its construct validity is suspect. For this reason, the more a construct is used by researchers in more settings with outcomes consistent with theory, the more its construct validity. Researchers should establish both of the two main types of construct validity, convergent and discriminant, for their constructs.

  3. Content validity, also called face validity, has to do with items seeming to measure what they claim to (studies can be internally valid and statistically valid, yet use measures lacking face validity). In content validity one is also concerned with whether the items measure the full domain implied by their label. Though derogated by some psychometricians as too subjective, failure of the researcher to establish credible content validity may easily lead to rejection of his or her findings. Use of surveys of panels of content experts or focus groups of representative subjects are ways in which content validity may be established, albeit using subjective judgments.

    Are the measures which operationalize concepts ones which seem by common sense to have to do with the concept? Or could there be a naming fallacy? Indicators may display construct validity, yet the label attached to the concept may be inappropriate.

  4. Internal validity has to do with defending against sources of bias arising in research design, which would affect the cause-effect process being studied by introducing covert variables. When there is lack of internal validity, variables other than the independent(s) being studied may be responsible for part or all of the observed effect on the dependent variable(s). If there is no causal phenomenon under study, internal validity is not at issue.

  5. Statistical validity has to do with basing conclusions on proper use of statistics. Violation of statistical assumptions is treated elsewhere, in the discussion of each specific statistical procedure. In addition, the following general questions may be asked of any study:



Bibliography



Copyright 1998, 2008, 2009 by G. David Garson.
Last update, 4/24/2009.