Skip to content

Commit

Permalink
Merge pull request #74 from egap/external-validity
Browse files Browse the repository at this point in the history
Updates to external validity guide
  • Loading branch information
jwbowers authored Aug 14, 2024
2 parents f698af5 + 156b775 commit 68251d2
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 65 deletions.
15 changes: 8 additions & 7 deletions external-validity/extval.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@ output:

Abstract
==
After months or years under development and implementation, navigating the practical, theoretical and inferential pitfalls of experimental social science research, your experiment has finally been completed. Comparing the treatment and control groups, you find a substantively and statistically significant result on an outcome of theoretical interest. Before you can pop the champagne in celebration of an intervention well evaluated, a friendly colleague asks: “But what does this tell us about the world?”
After months or years under development and implementation, navigating the practical, theoretical and inferential pitfalls of experimental social science research, your experiment has finally been completed. Comparing the treatment and control groups, you find a substantively and statistically significant effect on an outcome of theoretical interest. Before you can pop the champagne in celebration of an intervention well evaluated, a friendly colleague asks: “But what does this tell us about the world?”

1. What is external validity?
==
External validity is another name for the generalizability of results, asking “whether a causal relationship holds over variation in persons, settings, treatments and outcomes.”[^1] A classic example of an external validity concern is whether traditional economics or psychology lab experiments carried out on college students produce results that are generalizable to the broader public. In the political economy of development, we might consider how a community-driven development program in India might apply (or not) in West Africa, or Central America.

[^1]: Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton, Mifflin and Company.

External validity becomes particularly important when making policy recommendations that come from research. Extrapolating causal effects from one or more studies to a given policy context requires careful consideration of both theory and empirical evidence. This methods guide discusses some key concepts, pitfalls to avoid, and useful references to consider when going from a Local Average Treatment Effect to the larger world.
External validity becomes particularly important when making policy recommendations that come from research. Extrapolating causal effects from one or more studies to a given policy context requires careful consideration of both theory and empirical evidence. This methods guide discusses some key concepts, pitfalls to avoid, and useful references to consider when going from a Sample Average Treatment Effect to the larger world.

2. How is this different than internal validity?
==
Expand All @@ -29,7 +29,7 @@ Internal validity refers to the quality of causal inferences being made for a gi
[^2]: Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological bulletin, 54(4), 297.
[^3]: More details can be found in the causal inference methods guide: https://egap.org/resource/10-things-to-know-about-causal-inference

Before you can extrapolate a causal effect to a distinct population, it is vital that the original Average Treatment Effect be based on a well-identified result. For most experimentalists, random assignment provides the requisite identifying variation, provided no attrition, interference, spillovers, or other threats to inference. For observational studies, additional identifying assumptions are needed, such as conditional independence of the treatment from potential outcomes.
Before you can extrapolate a causal effect to a distinct population, it is vital that the original Average Treatment Effect be well-identified. Observational studies often require identifying assumptions such as conditional independence of the treatment from potential outcomes. In an experiment, random assignment ensures that most of these assumptions are guaranteed to hold, at least absent attrition, interference, spillovers, or other threats to inference.

3. Navigating the trade-offs between internal and external validity
==
Expand All @@ -39,16 +39,17 @@ On one side of the argument fall advocates of “identification first,” who ar

[^4]: Imbens, G. (2013). Book Review Feature: Public Policy in an Uncertain World: By Charles F. Manski. The Economic Journal,123(570), F401-F411.

Others argue that even without full identification of an internally valid result, useful information can be salvaged, especially if it is relevant for important questions that affect a broad context. Manski[^5] writes that “what matters is the informativeness of a study for policy making, which depends jointly on internal and external validity.” With data from a broad but a poorly identified study, Manski argues, bounds on the estimand of interest can be generated that, while not as useful as a precise point estimate, still moves science forward.
Others argue that even without full identification of an internally valid result, useful information can be salvaged, especially if it is relevant for important questions that affect a broad context. Manski[^5] writes that “what matters is the informativeness of a study for policy making, which depends jointly on internal and external validity.” With data from a broad but a poorly identified study, Manski argues, bounds on the estimand of interest can be generated that, while not as useful as a precise point estimate, still move science forward.

[^5]: Manski, C. F. (2013). Response to the review of ‘public policy in an uncertain world’. The Economic Journal 123: F412–F415.

4. Theory and generalization
==
Extrapolating a result to a distinct context, outcome, population or treatment is not a mechanical process. As discussed by Samii[^6] and Rosenbaum,[^7] relevant theory should be used to guide generalization, taking the relevant existing evidence and making predictions for other contexts in a principled fashion. Theories boil down complex problems into more parsimonious representations, and help to elucidate what factors matter. Just as theory guides the content of interventions and research designs, theoretical propositions can tell you which scope conditions are relevant for extrapolating a result. What covariates matter? What contextual information matters?
Extrapolating a result to a distinct context, outcome, population or treatment is not a mechanical process. As discussed by Samii[^6] and Rosenbaum,[^7] relevant theory should be used to guide generalization, taking the relevant existing evidence and making predictions for other contexts in a principled fashion. Theories boil down complex problems into more parsimonious representations, and help to elucidate what factors matter. Just as theory guides the content of interventions and research designs, theoretical propositions can shed light on the scope conditions that are relevant for extrapolating a result given the theory's assumptions. See Pearl and Bareinboim (2014) for details on how theoretical models can be used in this way.[^7a] What covariates matter? What contextual information matters?

[^6]: Samii, Cyrus. (2016). “Causal Empiricism in Quantitative Research.” Journal of Politics 78(3):941–955.
[^7]: Rosenbaum, Paul R. (1999). “Choice as an Alternative to Control in Observational Studies” (with discussion). Statistical Science 14(3): 259–304.
[^7a]: Pearl, Judea, and Elias Bareinboim. 2014. “External Validity: From Do-Calculus to Transportability Across Populations.” Statistical Science 29 (4): 579–95.

5. How can I determine where my results apply?
==
Expand All @@ -64,15 +65,15 @@ Because generalization is primarily a prediction exercise, asking where we can e

6. Strategic behavior can scuttle your extrapolations
==
Extrapolating a local result to a different context can prove challenging even with a compelling covariate profile to which you want to generalize effects. A randomized experimental manipulation in a local area generates a “partial equilibrium effect.” Strategic dynamics, including compensatory behavior or backlashes, outside the local context of an experimental intervention can complicate efforts to generalize a result. Suppose, for example, that an unconditional cash transfer intervention is shown to increase welfare, entrepreneurship, and employment in a sample of 200 villages. What would happen if the intervention were extended to encompass 1000 villages? At this point, one could imagine that regions excluded from the program are more likely to learn about it. Untreated units may start to demand other types of transfers from the government, giving rise to effects similar to those produced by the direct cash transfer. In a similar vein, sometimes causal relationships only work when they are applied to some people. For example, imagine a job skills program that functions very well (as compared to those who did not receive it), what would happen if it were extended to all workers? Even if there are positive effects across all participants, there could be reduced or no average effects as higher skilled jobs are already filled by the first batch and the second batch is forced to remain in their previous jobs, now overqualified. In short, under general equilibrium conditions we might expect different results even where the covariate profile matches.
Extrapolating a local result to a different context can prove challenging even with a compelling covariate profile to which you want to generalize effects. A randomized experimental manipulation in a local area generates a “partial equilibrium effect.” Strategic dynamics, including compensatory behavior or backlashes, outside the local context of an experimental intervention can complicate efforts to generalize a result -- especially towards an intervention at much larger scale. Suppose, for example, that an unconditional cash transfer intervention is shown to increase welfare, entrepreneurship, and employment in a sample of 200 villages. What would happen if the intervention were extended to encompass 1000 villages? At this point, one could imagine that regions excluded from the program are more likely to learn about it. Untreated units may start to demand other types of transfers from the government, giving rise to effects similar to those produced by the direct cash transfer. In a similar vein, causal relationships sometimes operate for only a certain number of people. For example, imagine a job skills program that is tested with a small group of workers and is shown to increase the likelihood that workers enter higher skilled jobs. What would happen if the program were extended to all workers? Even if there are positive effects on skills across all participants, there could be reduced or no average effects on job placement. If there is a limited supply of high skilled jobs, high-skilled workers may be forced to remain in their previous jobs, now overqualified. In short, under general equilibrium conditions we might expect different results even where covariate profiles match.

7. Don’t confuse external validity with construct validity or ecological validity
==
Internal and external validity are not the only ‘validity’ concerns that can be leveled at experimental work, and though relevant, they are also distinct. Ecological validity, as defined by Shadish, Cook and Campbell[^12] concerns whether an intervention appears artificial or out of place when deployed in a new context. For example, does an information workshop in a rural town carried out by experimenters resemble the kinds of information sharing that the population may experience in regular life? Similarly, if the same workshop were held in a large city, would it appear out of place?

[^12]: Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton, Mifflin and Company.

Construct validity considers whether a theoretical concept being tested in a study is appropriately operationalized by the treatment(s). If your experiment is testing the effect of anger on political reciprocity and you are in fact manipulating fear or trust in your treatment, construct validity may be violated. Both construct and ecological validity are relevant for generalizations, and thus useful for making claims about external validity.
Construct validity considers whether a theoretical concept is appropriately operationalized by the treatment(s). (See our guide [10 Things to Know About Measurement in Experiments](https://egap.org/resource/10-things-to-know-about-measurement-in-experiments/) for more on constructs and measurement). If your experiment is testing the effect of anger on political reciprocity and you are in fact manipulating fear or trust in your treatment, construct validity may be violated. Both construct and ecological validity are relevant for generalizations, and thus useful for making claims about external validity.

8. Extrapolation across treatments and outcomes
==
Expand Down
Loading

0 comments on commit 68251d2

Please sign in to comment.