-
Notifications
You must be signed in to change notification settings - Fork 6
/
day9_pre_post.Rmd
95 lines (73 loc) · 3.3 KB
/
day9_pre_post.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# A pre-vs-post design?
## Why not use a pre-vs-post design for these data? {.allowframebreaks}
We have homocide rates in 2003, indicator of Metrocable station completion in
2004, and homocide rates in 2008. Why not use a difference-in-differences design
here? Or a lagged dependent variable approach (Ding and Li 2019, Blackwell and
Glynn 2018).
```{r echo=TRUE}
library(estimatr)
library(tidyverse)
## Reshape the data to "long" form
meddat_long <- meddat %>% dplyr::select(one_of("nh","nhTrt","HomRate08","HomRate03")) %>% pivot_longer(cols=c("HomRate08","HomRate03"),names_to="year",names_prefix="HomRate0",values_to="HomRate")
head(meddat_long)
meddat_long$year <- as.numeric(meddat_long$year) + 2000
head(meddat_long)
## Calc by hand
the_means <- meddat_long %>% group_by(nhTrt,year) %>% summarize(bary=mean(HomRate)) %>% ungroup()
the_means
the_diffs <- the_means %>% summarize(trt_diff=bary[nhTrt==1&year==2008] - bary[nhTrt==1&year==2003],
ctrl_diff=bary[nhTrt==0&year==2008] - bary[nhTrt==0&year==2003])
the_diffs
did0 <- with(the_diffs, trt_diff - ctrl_diff)
did0
## Calc with OLS
did2 <- lm_robust(HomRate~nhTrt*I(year==2008),meddat_long,cluster=nh)
did2
## Using the wide format data
the_diffs2 <- meddat %>% summarize(trt_diff=mean(HomRate08[nhTrt==1]) - mean(HomRate03[nhTrt==1]),
ctrl_diff = mean(HomRate08[nhTrt==0]) - mean(HomRate03[nhTrt==0]))
the_diffs2
did0_wide <- with(the_diffs2, trt_diff - ctrl_diff)
did0_wide
did3 <- lm_robust(I(HomRate08-HomRate03)~nhTrt,data=meddat)
did3
## Following Ding and Li 2019 to "bracket" the ATT using the DID and Lag DV
lag_dv <- lm_robust(HomRate08~nhTrt+HomRate03,data=meddat)
lag_dv
```
## Another pre-post-design
```{r, echo=TRUE}
## Make std diffs on baseline
baseline_dist <- match_on(nhTrt~HomRate03,data=meddat)
quantile(baseline_dist,seq(0,1,.1))
pm_hr03 <- pairmatch(baseline_dist + caliper(baseline_dist,.4),data=meddat,remove.unmatchables = TRUE)
stratumStructure(pm_hr03)
summary(unlist(matched.distances(pm_hr03,baseline_dist)))
meddat$pm_hr03 <- pm_hr03
baltest_hr03 <- balanceTest(nhTrt~HomRate03+strata(pm_hr03),data=meddat,subset=!is.na(pm_hr03))
baltest_hr03$overall
grpdat <- meddat %>% filter(!is.na(pm_hr03)) %>%
group_by(pm_hr03) %>%
summarize(diffHR03=mean(HomRate03[nhTrt==1]) - mean(HomRate03[nhTrt==0])) %>%
arrange(diffHR03)
```
```{r}
grpdat
## Within set very homogeneous on baseline homocide rates:
est1<- lm_robust(HomRate08~nhTrt,fixed_effects = ~pm_hr03,data=meddat,subset=!is.na(pm_hr03))
est1
## Look only at the difference
est2<- lm_robust(I(HomRate08-HomRate03)~nhTrt,fixed_effects = ~pm_hr03,data=meddat,subset=!is.na(pm_hr03))
est2
```
## Questions to answer with a pre-vs-post design for these data
Some questions to answer:
- What justifies claims about unbiasedness or at least consistency of the
estimator of the Average effect of treatment on the treated (the ATT or Effect
of Treatment on the Treated (ETT))?
- What justifies use of a Normal (or t-distribution) as a description of the
hypothesize of no average effects? (It is **not** an approximation to an
as-if-randomized distribution. Is this a random sample from a population? Do we
believe that homocide rates arise from an IID Normal DGP?)
- What if we wanted to test another hypothesis or use a different test statistic?
Say, the sharp null of no effects using a rank based test?