Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
malisi committed Jun 16, 2022
2 parents 46f92e0 + da8d3a7 commit 7a135ae
Show file tree
Hide file tree
Showing 31 changed files with 1,721 additions and 4,355 deletions.
12 changes: 12 additions & 0 deletions .github/workflows/binder.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
name: Build and Deploy Binder Image Currently for Latex Guide
on: [push]

jobs:
Create-MyBinderOrg-Cache:
runs-on: ubuntu-latest
steps:
- name: cache binder build on mybinder.org
uses: jupyterhub/repo2docker-action@master
with:
NO_PUSH: true
MYBINDERORG_TAG: ${{ github.event.ref }} # This builds the container on mybinder.org with the branch that was pushed on.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
.DS_Store
*/*_cache*
*settings.dcf
R-dev/
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ all: adaptive/adaptive.html \
meta-analysis/meta-analysis.html \
missing_data/missing_data.html \
multiple-comparisons/multiple-comparisons.html \
multisite/multisite.html \
null/null_results.html \
pap/pap.html \
pilots/10_things_to_know_about_pilots.html \
Expand Down Expand Up @@ -106,6 +107,9 @@ missing_data/missing_data.html: missing_data/missing_data.Rmd
multiple-comparisons/multiple-comparisons.html: multiple-comparisons/multiple-comparisons.Rmd
Rscript -e "rmarkdown::render('./multiple-comparisons/multiple-comparisons.Rmd')"

multisite/multisite.html: multisite/multisite.Rmd multisite/refs.bib
Rscript -e "rmarkdown::render('./multisite/multisite.Rmd')"

null/null_results.html: null/null_results.Rmd
Rscript -e "rmarkdown::render('./null/null_results.Rmd')"

Expand Down
23 changes: 23 additions & 0 deletions NotUsedDockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Trying this approach with a pre-build Docker image
# following https://mybinder.readthedocs.io/en/latest/tutorials/dockerfile.html
# Note that there must be a tag
FROM jwbowers/methods-guides:d177a56ed870

ARG NB_USER=jovyan
ARG NB_UID=1000
ENV USER ${NB_USER}
ENV NB_UID ${NB_UID}
ENV HOME /home/${NB_USER}

RUN adduser --disabled-password \
--gecos "Default user" \
--uid ${NB_UID} \
${NB_USER}

# Make sure the contents of our repo are in ${HOME}
COPY . ${HOME}
USER root
RUN chown -R ${NB_UID} ${HOME}
USER ${NB_USER}

RUN python3 -m pip install --no-cache-dir notebook jupyterlab
2 changes: 1 addition & 1 deletion hte/heteffects.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ FDR control tends to be less conservative than FWER control and is popular in fi

8 Use a Pre-Analysis Plan To Reduce the Number of Hypothesis Tests
==
You can also reduce the numbers of CATEs and interactions under consideration for hypothesis testing by pre-specifying the tests of primary interest in a registered pre-analysis plan (PAP). Additional subgroup analyses can be conceptualized and specified as exploratory or descriptive analyses in the PAP. Another bonus is that if you prefer a one-sided test, you can commit to that choice in the PAP before seeing the outcome data, so that you "cannot be justly accused of cherry-picking the test after the fact" (Olken 2015).^[Benjamin A. Olken (2015), "Promises and Perils of Pre-Analysis Plans," _Journal of Economic Perspectives_ 29(3): 61--80.]
You can also reduce the numbers of CATEs and interactions under consideration for hypothesis testing by pre-specifying the tests of primary interest in a registered pre-analysis plan (PAP). Additional subgroup analyses can be conceptualized and specified as exploratory or descriptive analyses in the PAP. Another bonus is that if you prefer a one-sided test, you can commit to that choice in the PAP before seeing the outcome data, so that you "cannot be justly accused of cherry-picking the test after the fact" (Olken 2015).^[Benjamin A. Olken (2015), "Promises and Perils of Pre-Analysis Plans," _Journal of Economic Perspectives_ 29(3): 61--80.] See our guide [10 Things to Know About Pre-Analysis Plans](https://egap.org/resource/10-things-to-know-about-pre-analysis-plans/) for more on pre-registration.

9 Automate the Search for Interactions
==
Expand Down
8 changes: 7 additions & 1 deletion hte/heteffects.html
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,7 @@




<style type="text/css">
.main-container {
max-width: 940px;
Expand All @@ -287,6 +288,9 @@
summary {
display: list-item;
}
details > summary > p:only-child {
display: inline;
}
pre code {
padding: 0;
}
Expand Down Expand Up @@ -872,7 +876,9 @@ <h1>8 Use a Pre-Analysis Plan To Reduce the Number of Hypothesis
descriptive analyses in the PAP. Another bonus is that if you prefer a
one-sided test, you can commit to that choice in the PAP before seeing
the outcome data, so that you “cannot be justly accused of
cherry-picking the test after the fact” (Olken 2015).<a href="#fn13" class="footnote-ref" id="fnref13"><sup>13</sup></a></p>
cherry-picking the test after the fact” (Olken 2015).<a href="#fn13" class="footnote-ref" id="fnref13"><sup>13</sup></a> See our guide <a href="https://egap.org/resource/10-things-to-know-about-pre-analysis-plans/">10
Things to Know About Pre-Analysis Plans</a> for more on
pre-registration.</p>
</div>
<div id="automate-the-search-for-interactions" class="section level1">
<h1>9 Automate the Search for Interactions</h1>
Expand Down
1 change: 1 addition & 0 deletions hte/heteffects_cache/html/__packages
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
base
52 changes: 25 additions & 27 deletions hypothesistesting/hypothesistesting.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -131,29 +131,25 @@ inference."

## An overview of estimation based approaches to causal inference in randomized experiments.

There are three main ways that the statistical sciences have engaged with this
problem. That is, when asked, "Does information cause people to pay their
There are three main ways that the statistical sciences have engaged with the fundamental
problem of causal inference. All of these ways involve changing the target of inference. That is, when asked, "Does information cause people to pay their
taxes?" we tend to say, "We cannot answer that question directly. However, we can
answer a related question." [Ten Types of Treatment Effect You Should Know About](https://egap.org/resource/10-types-of-treatment-effect-you-should-know-about/) describes an insight that we credit to Jerzy Neyman
where a scientist can **estimate average causal effects** in a randomized
experiment even if individual causal effects are unobservable. Judea
Pearl's work on estimating the conditional probability of an outcome
based on a causal model of that outcome is similar to this idea, where a focus
is on the conditional probabilities of the $y_{i}$'s. That is, those two approaches answer the
fundamental causal question by changing the question focus on averages
or conditional probabilities. A related approach, from Don Rubin begins by **predicting** the
individual level potential outcomes using background information and a
probability model of $Z_i$ (such that, say, $Z \sim \text{Bernoulli}(\pi)$) and a
probability model of the two potential outcomes such that, say,
$(y_{i,1},y_{i,0}) \sim \text{Multivariate Normal}(\bbeta \bX, \bSigma)$ with a vector of coefficients $\bbeta$, an $n \times p$ matrix of variables $\bX$ (containing both treatment assignment and other variables and a $p \times p$ variance-covariance matrix $\Sigma$ describing how all of the columns in $\bX$ relate to one another).

The second general approach starts with such probability models relating treatment, other variables, and outcomes to each other, and
combines them using Bayes Rule to produce posterior distributions for
answer a related question."

The first approach changes the question from whether information causes a particular person to pay her taxes to whether information causes people to pay their taxes *on average*. [Ten Types of Treatment Effect You Should Know About](https://egap.org/resource/10-types-of-treatment-effect-you-should-know-about/) describes how a scientist can **estimate average causal effects** in a randomized experiment even though individual causal effects are unobservable. This insight is credited to Jerzy Neyman. Judea
Pearl's work on estimating the conditional probability distribution of an outcome based on a causal model of that outcome is similar to this idea. Both approaches address the
fundamental problem of causal inference by changing the question to focus on averages
or conditional probabilities rather than individuals.

A related approach that is due to Don Rubin involves **predicting** the
individual level potential outcomes. The predictions are based on a
probability model of treatment assignment $Z_i$ (for example, $Z \sim \text{Bernoulli}(\pi)$) and a
probability model of the two potential outcomes (for example,
$(y_{i,1},y_{i,0}) \sim \text{Multivariate Normal}(\bbeta \bX, \bSigma)$ with a vector of coefficients $\bbeta$, an $n \times p$ matrix of variables $\bX$ (containing both treatment assignment and other background information) and a $p \times p$ variance-covariance matrix $\Sigma$ describing how all of the columns in $\bX$ relate to one another). The probability models relate treatment, background information, and outcomes to one another. The approach combines these models with data using Bayes' Rule to produce posterior distributions for
quantities like the individual level treatment effects or average treatment
effects (see [@imbens2007causal] for more on what they call the Bayesian
Predictive approach to causal inference). So, the predictive approach changes
the fundamental question from one about averages to one that focuses on differences in predicted potential
outcomes for each person (although mostly these individual predicted differences are summarized using characteristics of the posterior distributions implied by the probability models and the data like the average of the predictions.)
Predictive approach to causal inference). So, this predictive approach focuses not on averages but on differences in predicted potential
outcomes for each person (although mostly these individual predicted differences are summarized using characteristics of the posterior distributions, like the average of the predictions.)

## Hypothesis testing is a statistical approach to the fundamental problem of causal inference using claims about the unobserved.

Expand Down Expand Up @@ -237,9 +233,11 @@ per person, depending on which treatment was assigned to that person. So, we
can link the unobserved counterfactual outcomes to an observed outcome ($Y_i$)
using treatment assignment ($Z_i$) like so:

$$ Y_i = Z_i y_{i,1} + (1 - Z_i) y_{i,0} $$ {#eq:identity}
\begin{equation}
Y_i = Z_i y_{i,1} + (1 - Z_i) y_{i,0} (\#eq:identity)
\end{equation}

@eq:identity says that our observed outcome, $Y_i$ (here, amount of
\@ref(eq:identity) says that our observed outcome, $Y_i$ (here, amount of
taxes paid by person $i$), is $y_{i,1}$ when the person is assigned to the
treatment group ($Z_i=1$), and $y_{i,0}$ when the person is assigned to the
control group.
Expand All @@ -251,7 +249,7 @@ $\tau_i=5$ for all $i$.

Let us entertain this model for the sake of argument. What would this
hypothesis imply for what we observe? We have the equation relating observed
to unobserved in @eq:identity so, this model or hypothesis would
to unobserved in \@ref(eq:identity) so, this model or hypothesis would
imply that:

$$ \begin{aligned} Y_i & = Z_i ( y_{i,0} + \tau_i ) + ( 1 - Z_i) y_{i,0} \\
Expand Down Expand Up @@ -458,8 +456,8 @@ mtext(side=3,outer=TRUE,text=expression(paste("Distributions of Test Statistics
```

To formalize the comparison between observed and hypothesized, we can
calculate the proportion of the hypothetical experiments that yield test
statistics greater than the observed experiment. In the left panel of the
calculate a $p$-value, i.e., the proportion of the hypothetical experiments that yield test
statistics greater than or equal to the observed experiment. In the left panel of the
figure we see that a wide range of
differences of means between treated and control groups are compatible with
the treatment having no effects (with the overall range between `r min(possibleMeanDiffsH0)` and `r max(possibleMeanDiffsH0)`). The right panel shows
Expand All @@ -469,7 +467,7 @@ from 1 to 10 rather than from 1 to 280.

### One-sided $p$-values

The one-sided $p$-values are `r pMeanTZ` for the simple mean difference and `r signif(pMeanRankTZ,2)` for the mean difference of the rank-transformed outcomes. Each
One-sided $p$-values capture the probability that a test statistic is at least as big or bigger (upper $p$-value) or at least as smaller or smaller (lower $p$-value) than the observed test statistic. Here, the one-sided $p$-values are `r pMeanTZ` for the simple mean difference and `r signif(pMeanRankTZ,2)` for the mean difference of the rank-transformed outcomes. Each
test statistic casts a different amount of doubt, or quantifies a different
amount of surprise, about the same null hypothesis of no effects. The outcome
itself is so noisy that the mean
Expand Down Expand Up @@ -515,7 +513,7 @@ mean( possibleMeanRankDiffsH0 <= observedMeanRankTZ ))

In this case the two-sided $p$-values are `r p2SidedMeanTZ` and `r p2SidedMeanRankTZ` for the simple mean differences and means differences of ranks respectively. We interpret them in terms of "extremity" --- we would only see an observed mean difference as far away from zero as the one manifest in our results roughly 18% of the time, for example.

**As a side note** The test of the sharp null shown here can be done without
**As a side note:** The test of the sharp null shown here can be done without
writing the code yourself. The code that you'll see here (by clicking the code
buttons) shows how to use different R packages to test hypotheses using
randomization-based inference.
Expand Down
11 changes: 10 additions & 1 deletion latex-guide/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
# 10 Things LaTeX Guide

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/egap/methods-guides/tree/latex-guide/HEAD?urlpath=lab)
Here is a link to the online jupyter lab system which should allow people to see
a unix command line and to practice using LaTeX that way:

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/egap/methods-guides/HEAD?urlpath=lab/tree/latex-guide){target="_blank"}

## To Contribute

This guide is written in markdown with no R code, however, since the rest of the
methods guides use R markdown, you can still convert it from markdown to html as
follows.

To build the html file for the EGAP Guide with less typing use R to convert the markdown to html so that the styles look like the other styles of the [EGAP Methods Guides](https://egap.org/methods-guides/) (see also <https://github.com/egap/methods-guides>).

Expand Down
Loading

0 comments on commit 7a135ae

Please sign in to comment.