-
Notifications
You must be signed in to change notification settings - Fork 4
/
twolevelrand.Rmd
224 lines (168 loc) · 7.28 KB
/
twolevelrand.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
---
title: Two level randomization
author: Jake
date: 4 Dec 2015
---
This file produces a two-level randomization to illustrate the general ideas from
Ichino and Schuendeln (2012) which in turn draws from Sinclair, McConnell,
Green (2012).^[To make a html document from this file, do `library(rmarkdown);render("twolevelrand.Rmd")` or click on the `Knit HTML` in
RStudio --- see the help file for `render` to make a pdf document.]
```{r include=FALSE, cache=FALSE}
# Some customization. You can alter or delete as desired (if you know what you are doing).
# knitr settings to control how R chunks work.
## To make the html file do
## render("twolevelrand.Rmd",output_format=html_document(fig_retina=FALSE))
## To make the pdf file do
## render("twolevelrand.Rmd",output_format=pdf_document())
require(knitr)
opts_chunk$set(
tidy=FALSE, # display code as typed
size="small", # slightly smaller font for code
echo=TRUE,
results='markup',
strip.white=TRUE,
fig.path='figs/fig',
cache=FALSE,
highlight=TRUE,
width.cutoff=132,
size='footnotesize',
out.width='.9\\textwidth',
message=FALSE,
comment=NA,
fig.retina=FALSE)
```
```{r getready}
.libPaths("libraries")
library(RItools)
source("code/two-level-sampler.R")
```
Imagine that we have 10 large upper-level units (like counties or states) and
each of these upper-level units have 50 lower-level units within it. Here, for
an example, half of the upper-level units will be assigned to contain treated
lower-level units. And have of the individuals within the treatment eligible
lower-level units will be assigned treatment.
First, set up the values in the experiment. The units:
- states: 10 of them, each containing 4 counties, randomly assign one county to contain treated and control farmers, the other three counties only contain control farmers.
- counties: 10*4=40 of them, each containing between 10 and 50 farmers.
- farmers: within treated counties, assign half of the farmers to treatment.
```{r}
set.seed(20151204)
numStates<-10 ## like states
countiesPerState<-4
stateSizes <- rep( countiesPerState , numStates)
countySizesMat<-sapply(stateSizes,function(ncounties){
trunc(runif(ncounties,min=10,max=50))
})
colnames(countySizesMat)<-paste("state",1:numStates,sep="")
rownames(countySizesMat)<-paste("county",1:countiesPerState,sep="")
countySizes<-as.vector(countySizesMat)
N<-sum(countySizes)
farmerid<-1:N
countyid<-rep(rep(1:countiesPerState,numStates),countySizes)
stateid<-rep(1:numStates,colSums(countySizesMat))
statecountyid<-paste(stateid,countyid,sep=".")
## Make a fake dataset:
dat<-data.frame(farmerid=farmerid,
countyid=countyid,
stateid=stateid)
dat$statecountyid<-factor(statecountyid,
levels=unique(statecountyid))
## Make sure that we can re-create the list of county sizes (number of farmers within counties)
stopifnot(all.equal(as.vector(table(dat$statecountyid)),countySizes))
## How many people within each county (first, second, third ... )
table(dat$countyid)
## How many people within each state
table(dat$stateid)
```
Add some fake covariates to demonstrate randomization testing below. So far we are not blocking, but I imagine that we would just run the assignment separately within each block.
```{r}
## A variable with constant variance but different means by country
countymeans<-rnorm(length(countySizes))
dat$x1<-unsplit(lapply(split(dat,dat$statecountyid),function(d){
rnorm(nrow(d),mean=countymeans) }),dat$statecountyid)
## A binary variable with no relationship to county per se
dat$x2<-rbinom(nrow(dat),size=1,prob=.33)
## The size of the county as a covariate
dat$countySize<-unsplit(table(dat$statecountyid),statecountyid)
```
Define the number of counties eligible for any treatment assignment within each state. Here, with 4 counties in the experiment per state, we assign only one of them to receive treatments.
```{r}
numTrtedState <- rep(1,length(stateSizes))
```
Define the number of farmers within each county eligible for treatment assignment. Here, for now, assign half of the farmers in each treatment-eligible county to treatment.
```{r}
numTrtedCounty <- floor(countySizes/2)
```
Second, define the "sampler" (i.e. the randomizer, the function which does the random assignment).
```{r}
twoLevelSampler <- upperLowerSampler(upperBlockSize = stateSizes,
treatedUpper = numTrtedState,
lowerBlockSize = countySizes,
treatedLower = numTrtedCounty)
```
Third, try it out. The command `twoLevelSampler(1)` generates one random
assignment following that design. If we wanted lots of random assignments
because we were using re-randomization for covariate balance or to do
statistical testing or assessment of our statistical procedures, we would use
numbers other than 1 in the parenthesis.
```{r}
set.seed(20151204)
Z1<-twoLevelSampler(1)
str(Z1)
dat$Z1<-Z1$samples
```
Fourth, test the code. What should be true if this randomization followed the design?
One county and only one county per state should have any treated units:
```{r}
test1tabs<-lapply(split(dat,dat$stateid),function(d){
table(d$Z1,d$countyid) })
## Each state should have only 1 county with any treated units
test1<-sapply(test1tabs,function(tab){ sum(tab[2,]!=0) })
stopifnot(all(test1==1))
```
Within treated counties, roughly half of the farmers should be assigned to
treatment. The following table shows only one county per state assigned to any
treatment, and roughly half of the farmers in that county assigned to
treatment.
```{r}
tmp<-with(dat,table(Z1,statecountyid))
tmp
test2tab<-tmp[,tmp["1",]!=0]
test2<-test2tab[2,]/colSums(test2tab)
test2
## We cannot guarantee exactly half, so just make sure it is around .5
stopifnot(all(test2>.4 & test2<.6))
```
# Randomization Balance Assessment
Assess whether or not the randomization is balanced (it really should be, and
should pass the other tests, too, but it is reasonable to check.) This is a bit
complicated because of the two-level randomization. However, we can get some
rough ideas in a couple of ways. First, we can look at certain covariates alone
using the randomization distribution arising directly from re-assigning
treatment following the two level design above 1000 times.
```{r}
ritx1<-RItest(y=dat$x1, z=dat$Z1, test.stat=mean.difference, sampler=twoLevelSampler, samples=1000)
ritx2<-RItest(y=dat$x2, z=dat$Z1, test.stat=mean.difference, sampler=twoLevelSampler, samples=1000)
## P-values from the randomization distribution
c(ritx1, ritx2)
```
Second, we can use a Normal theory/Large-sample approximation if we pretended
that we didn't have two level randomization but one level blocked by county
within state:
```{r}
xb1<-xBalance(Z1~x1+x2,strata=list(statecounty=~statecountyid),
data=dat,report="all")
```
First, look at the omnibus test: across all covariates, how much information do we have against the hypothesis that this design was randomized. (The design, is that farmers were randomly assigned treatment within counties and states with equal probability.)
```{r}
xb1$overall
```
Next look for some more descriptive information about whether the covariates
were balanced. Not looking at the one-by-one p-values except as descriptive
information (since, often, we have many covariates and don't want to look at
many many tests).
```{r}
xb1$results
```
# Estimation of treatment effects
# Testing of hypotheses about causal effects