Merge branch 'main' of https://github.com/egap/methods-guides

egap · Jun 16, 2022 · 7a135ae · 7a135ae
2 parents 46f92e0 + da8d3a7
commit 7a135ae
Show file tree

Hide file tree

Showing 31 changed files with 1,721 additions and 4,355 deletions.
diff --git a/.github/workflows/binder.yml b/.github/workflows/binder.yml
@@ -0,0 +1,12 @@
+name: Build and Deploy Binder Image Currently for Latex Guide
+on: [push]
+
+jobs:
+  Create-MyBinderOrg-Cache:
+    runs-on: ubuntu-latest
+    steps:
+    - name: cache binder build on mybinder.org
+      uses: jupyterhub/repo2docker-action@master
+      with:
+        NO_PUSH: true
+        MYBINDERORG_TAG: ${{ github.event.ref }} # This builds the container on mybinder.org with the branch that was pushed on.
diff --git a/.gitignore b/.gitignore
@@ -5,3 +5,4 @@
 .DS_Store
 */*_cache*
 *settings.dcf
+R-dev/
diff --git a/Makefile b/Makefile
@@ -18,6 +18,7 @@ all: adaptive/adaptive.html \
 	meta-analysis/meta-analysis.html \
 	missing_data/missing_data.html \
 	multiple-comparisons/multiple-comparisons.html \
+	multisite/multisite.html \
 	null/null_results.html \
 	pap/pap.html \
 	pilots/10_things_to_know_about_pilots.html \
@@ -106,6 +107,9 @@ missing_data/missing_data.html:  missing_data/missing_data.Rmd
 multiple-comparisons/multiple-comparisons.html:  multiple-comparisons/multiple-comparisons.Rmd
 	Rscript -e "rmarkdown::render('./multiple-comparisons/multiple-comparisons.Rmd')"
 
+multisite/multisite.html:  multisite/multisite.Rmd multisite/refs.bib
+	Rscript -e "rmarkdown::render('./multisite/multisite.Rmd')"
+
 null/null_results.html:  null/null_results.Rmd
 	Rscript -e "rmarkdown::render('./null/null_results.Rmd')"
 

diff --git a/NotUsedDockerfile b/NotUsedDockerfile
@@ -0,0 +1,23 @@
+# Trying this approach with a pre-build Docker image
+# following https://mybinder.readthedocs.io/en/latest/tutorials/dockerfile.html
+# Note that there must be a tag
+FROM jwbowers/methods-guides:d177a56ed870
+
+ARG NB_USER=jovyan
+ARG NB_UID=1000
+ENV USER ${NB_USER}
+ENV NB_UID ${NB_UID}
+ENV HOME /home/${NB_USER}
+
+RUN adduser --disabled-password \
+    --gecos "Default user" \
+    --uid ${NB_UID} \
+    ${NB_USER}
+
+# Make sure the contents of our repo are in ${HOME}
+COPY . ${HOME}
+USER root
+RUN chown -R ${NB_UID} ${HOME}
+USER ${NB_USER}
+
+RUN python3 -m pip install --no-cache-dir notebook jupyterlab
diff --git a/hte/heteffects.Rmd b/hte/heteffects.Rmd
@@ -287,7 +287,7 @@ FDR control tends to be less conservative than FWER control and is popular in fi
 
 8 Use a Pre-Analysis Plan To Reduce the Number of Hypothesis Tests
 ==
-You can also reduce the numbers of CATEs and interactions under consideration for hypothesis testing by pre-specifying the tests of primary interest in a registered pre-analysis plan (PAP). Additional subgroup analyses can be conceptualized and specified as exploratory or descriptive analyses in the PAP. Another bonus is that if you prefer a one-sided test, you can commit to that choice in the PAP before seeing the outcome data, so that you "cannot be justly accused of cherry-picking the test after the fact" (Olken 2015).^[Benjamin A. Olken (2015), "Promises and Perils of Pre-Analysis Plans," _Journal of Economic Perspectives_ 29(3): 61--80.]
+You can also reduce the numbers of CATEs and interactions under consideration for hypothesis testing by pre-specifying the tests of primary interest in a registered pre-analysis plan (PAP). Additional subgroup analyses can be conceptualized and specified as exploratory or descriptive analyses in the PAP. Another bonus is that if you prefer a one-sided test, you can commit to that choice in the PAP before seeing the outcome data, so that you "cannot be justly accused of cherry-picking the test after the fact" (Olken 2015).^[Benjamin A. Olken (2015), "Promises and Perils of Pre-Analysis Plans," _Journal of Economic Perspectives_ 29(3): 61--80.] See our guide [10 Things to Know About Pre-Analysis Plans](https://egap.org/resource/10-things-to-know-about-pre-analysis-plans/) for more on pre-registration.
 
 9 Automate the Search for Interactions
 ==

diff --git a/hte/heteffects.html b/hte/heteffects.html
@@ -266,6 +266,7 @@
 
 
 
+
 <style type="text/css">
 .main-container {
   max-width: 940px;
@@ -287,6 +288,9 @@
 summary {
   display: list-item;
 }
+details > summary > p:only-child {
+  display: inline;
+}
 pre code {
   padding: 0;
 }
@@ -872,7 +876,9 @@ <h1>8 Use a Pre-Analysis Plan To Reduce the Number of Hypothesis
 descriptive analyses in the PAP. Another bonus is that if you prefer a
 one-sided test, you can commit to that choice in the PAP before seeing
 the outcome data, so that you “cannot be justly accused of
-cherry-picking the test after the fact” (Olken 2015).<a href="#fn13" class="footnote-ref" id="fnref13"><sup>13</sup></a></p>
+cherry-picking the test after the fact” (Olken 2015).<a href="#fn13" class="footnote-ref" id="fnref13"><sup>13</sup></a> See our guide <a href="https://egap.org/resource/10-things-to-know-about-pre-analysis-plans/">10
+Things to Know About Pre-Analysis Plans</a> for more on
+pre-registration.</p>
 </div>
 <div id="automate-the-search-for-interactions" class="section level1">
 <h1>9 Automate the Search for Interactions</h1>

diff --git a/hte/heteffects_cache/html/__packages b/hte/heteffects_cache/html/__packages
@@ -0,0 +1 @@
+base
diff --git a/hypothesistesting/hypothesistesting.Rmd b/hypothesistesting/hypothesistesting.Rmd
@@ -131,29 +131,25 @@ inference."
 
 ## An overview of estimation based approaches to causal inference in randomized experiments.
 
-There are three main ways that the statistical sciences have engaged with this
-problem. That is, when asked, "Does information cause people to pay their
+There are three main ways that the statistical sciences have engaged with the fundamental
+problem of causal inference. All of these ways involve changing the target of inference. That is, when asked, "Does information cause people to pay their
 taxes?" we tend to say, "We cannot answer that question directly. However, we can
-answer a related question." [Ten Types of Treatment Effect You Should Know About](https://egap.org/resource/10-types-of-treatment-effect-you-should-know-about/) describes an insight that we credit to Jerzy Neyman
-where a scientist can **estimate average causal effects** in a randomized
-experiment even if individual causal effects are unobservable. Judea
-Pearl's work on estimating the conditional probability of an outcome
-based on a causal model of that outcome is similar to this idea, where a focus
-is on the conditional probabilities of the $y_{i}$'s. That is, those two approaches answer the
-fundamental causal question by changing the question focus on averages
-or conditional probabilities. A related approach, from Don Rubin begins by **predicting** the
-individual level potential outcomes using background information and a
-probability model of $Z_i$ (such that, say, $Z \sim \text{Bernoulli}(\pi)$) and a
-probability model of the two potential outcomes such that, say,
-$(y_{i,1},y_{i,0}) \sim \text{Multivariate Normal}(\bbeta \bX, \bSigma)$ with a vector of coefficients $\bbeta$, an $n \times p$ matrix of variables $\bX$ (containing both treatment assignment and other variables and a $p \times p$ variance-covariance matrix $\Sigma$ describing how all of the columns in $\bX$ relate to one another).
-
-The second general approach starts with such probability models relating treatment, other variables, and outcomes to each other, and
-combines them using Bayes Rule to produce posterior distributions for
+answer a related question."
+
+The first approach changes the question from whether information causes a particular person to pay her taxes to whether information causes people to pay their taxes *on average*. [Ten Types of Treatment Effect You Should Know About](https://egap.org/resource/10-types-of-treatment-effect-you-should-know-about/) describes how a scientist can **estimate average causal effects** in a randomized experiment even though individual causal effects are unobservable. This insight is credited to Jerzy Neyman. Judea
+Pearl's work on estimating the conditional probability distribution of an outcome based on a causal model of that outcome is similar to this idea. Both approaches address the
+fundamental problem of causal inference by changing the question to focus on averages
+or conditional probabilities rather than individuals.
+
+A related approach that is due to Don Rubin involves **predicting** the
+individual level potential outcomes. The predictions are based on a
+probability model of treatment assignment $Z_i$ (for example, $Z \sim \text{Bernoulli}(\pi)$) and a
+probability model of the two potential outcomes (for example,
+$(y_{i,1},y_{i,0}) \sim \text{Multivariate Normal}(\bbeta \bX, \bSigma)$ with a vector of coefficients $\bbeta$, an $n \times p$ matrix of variables $\bX$ (containing both treatment assignment and other background information) and a $p \times p$ variance-covariance matrix $\Sigma$ describing how all of the columns in $\bX$ relate to one another). The probability models relate treatment, background information, and outcomes to one another. The approach combines these models with data using Bayes' Rule to produce posterior distributions for
 quantities like the individual level treatment effects or average treatment
 effects (see [@imbens2007causal] for more on what they call the Bayesian
-Predictive approach to causal inference). So, the predictive approach changes
-the fundamental question from one about averages to one that focuses on differences in predicted potential
-outcomes for each person (although mostly these individual predicted differences are summarized using characteristics of the posterior distributions implied by the probability models and the data like the average of the predictions.)
+Predictive approach to causal inference). So, this predictive approach focuses not on averages but on differences in predicted potential
+outcomes for each person (although mostly these individual predicted differences are summarized using characteristics of the posterior distributions, like the average of the predictions.)
 
 ## Hypothesis testing is a statistical approach to the fundamental problem of causal inference using claims about the unobserved.
 
@@ -237,9 +233,11 @@ per person, depending on which treatment was assigned to that person. So, we
 can link the unobserved counterfactual outcomes to an observed outcome ($Y_i$)
 using treatment assignment ($Z_i$) like so:
 
-$$ Y_i = Z_i y_{i,1} + (1 - Z_i) y_{i,0} $$ {#eq:identity}
+\begin{equation}
+Y_i = Z_i y_{i,1} + (1 - Z_i) y_{i,0} (\#eq:identity)
+\end{equation}
 
-@eq:identity says that our observed outcome, $Y_i$ (here, amount of
+\@ref(eq:identity) says that our observed outcome, $Y_i$ (here, amount of
 taxes paid by person $i$), is $y_{i,1}$ when the person is assigned to the
 treatment group ($Z_i=1$), and $y_{i,0}$ when the person is assigned to the
 control group.
@@ -251,7 +249,7 @@ $\tau_i=5$ for all $i$.
 
 Let us entertain this model for the sake of argument. What would this
 hypothesis imply for what we observe? We have the equation relating observed
-to unobserved in @eq:identity so, this model or hypothesis would
+to unobserved in \@ref(eq:identity) so, this model or hypothesis would
 imply that:
 
 $$ \begin{aligned}  Y_i & = Z_i ( y_{i,0} + \tau_i ) + ( 1 - Z_i) y_{i,0} \\
@@ -458,8 +456,8 @@ mtext(side=3,outer=TRUE,text=expression(paste("Distributions of Test Statistics
 ```
 
 To formalize the comparison between observed and hypothesized, we can
-calculate the proportion of the hypothetical experiments that yield test
-statistics greater than the observed experiment. In the left panel of the
+calculate a $p$-value, i.e., the proportion of the hypothetical experiments that yield test
+statistics greater  than or equal to the observed experiment. In the left panel of the
 figure we see that a wide range of
 differences of means between treated and control groups are compatible with
 the treatment having no effects (with the overall range between `r min(possibleMeanDiffsH0)` and `r max(possibleMeanDiffsH0)`).  The right panel shows
@@ -469,7 +467,7 @@ from 1 to 10 rather than from 1 to 280.
 
 ### One-sided $p$-values
 
-The one-sided $p$-values are `r pMeanTZ` for the simple mean difference and `r signif(pMeanRankTZ,2)` for the mean difference of the rank-transformed outcomes. Each
+One-sided $p$-values capture the probability that a test statistic is at least as big or bigger (upper $p$-value) or at least as smaller or smaller (lower $p$-value) than the observed test statistic. Here, the one-sided $p$-values are `r pMeanTZ` for the simple mean difference and `r signif(pMeanRankTZ,2)` for the mean difference of the rank-transformed outcomes. Each
 test statistic casts a different amount of doubt, or quantifies a different
 amount of surprise, about the same null hypothesis of no effects. The outcome
 itself is so noisy that the mean
@@ -515,7 +513,7 @@ mean( possibleMeanRankDiffsH0 <= observedMeanRankTZ ))
 
 In this case the two-sided $p$-values are `r p2SidedMeanTZ` and `r p2SidedMeanRankTZ` for the simple mean differences and means differences of ranks respectively. We interpret them in terms of "extremity" --- we would only see an observed mean difference as far away from zero as the one manifest in our results roughly 18% of the time, for example.
 
-**As a side note** The test of the sharp null shown here can be done without
+**As a side note:** The test of the sharp null shown here can be done without
 writing the code yourself. The code that you'll see here (by clicking the code
 buttons) shows how  to use different R packages to  test hypotheses  using
 randomization-based inference.

diff --git a/latex-guide/README.md b/latex-guide/README.md
@@ -1,6 +1,15 @@
 # 10 Things LaTeX Guide
 
-[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/egap/methods-guides/tree/latex-guide/HEAD?urlpath=lab)
+Here is a link to the online jupyter lab system which should allow people to see
+a unix command line and to practice using LaTeX that way:
+
+[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/egap/methods-guides/HEAD?urlpath=lab/tree/latex-guide){target="_blank"}
+
+## To Contribute
+
+This guide is written in markdown with no R code, however, since the rest of the
+methods guides use R markdown, you can still convert it from markdown to html as
+follows.
 
 To build the html file for the EGAP Guide with less typing use R to convert the markdown to html so that the styles look like the other styles of the [EGAP Methods Guides](https://egap.org/methods-guides/) (see also <https://github.com/egap/methods-guides>).