-
Notifications
You must be signed in to change notification settings - Fork 0
/
making_and_evaluating_matched_designs.qmd
789 lines (649 loc) · 33.7 KB
/
making_and_evaluating_matched_designs.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
---
title: The OES Guide to Making and Evaluating Matched Designs (DRAFT!!)
author: Jake Bowers and
date: '`r format(Sys.Date(), "%B %d, %Y")`'
bibliography: big.bib
execute:
echo: true
output: true
message: false
warning: false
format:
html:
toc: true
number-sections: true
embed-resources: true
code-fold: false
fig-caption: true
code-links:
- text: Github
icon: github
repo: https://github.com/bowers-illinois-edu/OES-Matching-Guide
---
```{r loadlibraries}
#| code-fold: true
#| code-summary: "Code to load libraries and general setup"
if(!require(renv)){
install.packages("renv",method="wget")
}
library(tidyverse)
library(RItools)
library(optmatch)
library(designmatch)
library(highs)
library(coin)
library(conflicted)
library(estimatr)
library(senstrat)
library(sensemakr)
conflicts_prefer(dplyr::select)
conflicts_prefer(dplyr::filter)
options(digits=4,scipen=8)
```
# Overview
We create matched research designs to address alternative explanations. Say we
have a policy intervention, $Z$, that we code as $Z_i=0$ for people, $i$, who
did not experience the intervention and $Z_i=1$ for those who did experience
the intervention, and an outcome that we observe for each person, $Y_i$. We
would like to know whether the outcome among people who experienced the
intervention ($Y_i|Z_i=1$) is different from the outcome among those who did
not experience the intervention ($Y_i|Z_i=0$) **because of the intervention
rather than some other factor, $x_i$**. A matched design will only compare
$Y_i|Z_i=1$ with $Y_i|Z_i=0$ **within strata of people who are similar on their
values of $x_i$ in an effort to set aside $x_i$ as an explanation of the
difference in outcomes.**
For example, say the intervention is a job training program and the outcome is
income and we see that people who took the job training program have higher
incomes than people who did not. One explanation for the difference could be
that the job training program helped people earn more. An alternative
explanation might be that that older people took the job training program and,
because they have more workforce experience tend to earn higher wages than
younger people and were also more likely to sign up for the job training
program. We can address this alternative explanation by creating strata of
people who have the same age and only comparing incomes between those who took
the training program and those who did not among people of the same age. For
example, we might write the difference in average income between job training
alumni and those who did not take job training among people who are 18 years
old as $\bar{Y}=(\bar{Y}|Z=1,X=18)-(\bar{Y}|Z=0,X=18)$. Within that strata of
18 year olds, there are no differences in age. So, comparisons in income
between those with the job training and without cannot be interepreted as
arising from differences in age (unless age differences in month or week
matter). If age differences tend to only matter if the differences in age are a
matter of years, then we can create a research design that **adjusts for this
alternative explanation** by restricting our comparisons to people who are the
same age: creating strata of 18 year olds, 19 year olds, etc $\ldots$.
In this memo, we discuss generalizations of this basic approach: from exactly
matching on a single variable to exclude alternative explanations based on that
variable alone, to matching on a number of variables all at once in the hopes
that we might set aside a number of alternative explanations all together.
Although we eventually will want to estimate values for and test hypotheses
about causal effects, this creation of a matched design is a research design
step because it can be done without knowledge of the values for the outcome
variables.^[It is useful to know which units have missing data on the outcomes,
but it is not necessary to know the values of the outcomes.] This separation of
design from outcome analysis allows us to use the data to make design decisions
without worrying about harming the false positive rates of our tests. Those who
wish to follow up on the general ideas presented here might like to see
@rosenbaum2020modern, @rosenbaum2020book.
This memo also is a code tutorial, showing examples throughout. It is currently
only using R because that is the language where those who have been developing
modern approaches to the creation of matched designs are working.
# Create Data
To make this guide portable, we are making up a fake dataset with covariates
that help determine the probability of treatment.
```{r makedata}
#| echo: true
#| output: false
N <- 1000
set.seed(12345)
dat <- data.frame(id = 1:N)
dat <- dat %>% mutate(
sso = sample(1:10, size = N, replace = TRUE),
age = rnorm(n = N, mean = 45, sd = 10),
sex = factor(sample(c("M", "F"), size = N, replace = TRUE)),
race = factor(sample(c("A", "B", "W"), size = N, replace = TRUE, prob = c(.05, .15, .8))),
u = rnorm(n = N, mean = sso, sd = exp(sso)),
demog_latent0 = .5 * sd(u) * age + .2 * sd(u) * as.numeric(sex) + .2 * sd(u) * as.numeric(race) + u,
demog = (demog_latent0 - min(demog_latent0)) / (max(demog_latent0) - min(demog_latent0)),
edu = factor(sample(letters[1:8], size = N, prob = runif(n = 8, min = .1, max = pmax(.2, demog)), replace = TRUE)),
grade = sample(1:6, size = N, replace = TRUE),
pay = abs(rnorm(n = N, mean = (demog + .01) * 50000 + 50000, sd = 50000)),
tenure = rnorm(n = N, mean = 10, sd = 3),
perf = runif(n = N, min = 1, max = 10),
preY0 = abs(.2 * demog + rnorm(N)),
preY = (preY0 - min(preY0)) / (max(preY0) - min(preY0)),
trtcont = rnorm(n = N, mean = demog + preY, sd = sd(demog + preY)),
## trtbin = rbinom(n = N, size = 1, prob = pmin(.95,demog+preY)),
trtbin = as.numeric(trtcont >= quantile(trtcont, 2 / 3)),
postY = .2 * sd(preY) * trtbin + preY + rnorm(N)
)
## optmatch will need row names
row.names(dat) <- dat$id
## Just verifying that trtbin is related to demog and preY
## table(dat$trtbin, exclude = c())
## with(dat, boxplot(demog ~ trtbin))
## with(dat, boxplot(preY ~ trtbin))
## Check the relationship between demog and its components. Not really using demog directly below.
## The idea is to ensure some relationships among the covariates using demog.
lm1 <- lm(demog ~ sso + age + sex + race + edu, data = dat)
# summary(lm1)
lm2 <- lm(preY ~ demog, data = dat)
# summary(lm2)
# cor(dat[, c("age", "demog", "pay", "preY")])
```
Our goal is to assess the effect of some observed intervention (that we call
`trtbin` here for the binary version or `trtcont`) on some outcome (that we
call `postY`). We worry that a naive description of the relationship between
intervention and outcome would not provide useful policy guidance because any
such simple two-way relationship might contain other relationships (say, those
driven by age or education in this example). From a policy standpoint we can
change the intervention but we cannot change the education or age of people, so
we want to remove the influence of age and education (among other relatively
fixed characteristics of people) from our description. That is, we want to be
able to **interpret** the relationship we calculate between intervention and
outcome as reflecting, as much as possible, only the intervention and the
outcome and not age and education.^[We borrow the idea of "interpretable
comparisons" from @kinder1993behalf]
# The simplest matched design: 2 strata, 1 variable
As an example of where we are going, we show an example of what is, in essence,
a matched design, but one that has only 2 strata and using only 1 variable.
Here we will also show the outcome analysis, just to be clear about how
stratification can enable adjustment or substitute for linear additive
"controlling for".
Say that we want to remove the effect of sex from the relationship between the
intervention and the outcome. The simplest and most transparent way to do this
would be to calculate the relationship between the intervention and the outcome
only among men and only among women (imagining that our administrative data do
not yet record non-binary gender).
Below we show this most basic approach to using stratification to remove or
control for the effects of a single variable with two levels as a way to
introduce multivariate optimal matching later. What we see here is that the
effect among men is different from the effect among women, and that the effect
among each sex cannot be confounded by sex --- after all the variables `sex` is
held perfectly constant.
```{r two_strata_example}
with(dat, table(treated=trtbin, sex=sex, exclude = c()))
lm_sexM <- lm(postY ~ trtbin, data = dat, subset = sex == "M")
lm_sexF <- lm(postY ~ trtbin, data = dat, subset = sex == "F")
coef(lm_sexM)[["trtbin"]]
coef(lm_sexF)[["trtbin"]]
```
Now, these are two descriptions of the relationship of interest, but we tend to
think about stratification as a way to *remove* confounding relationships from
*a single overall* relationship. How to calculate this? We show this in steps
here --- first doing it very simply by hand, and then using "fixed effects". We
will return to these ideas after creating matched designs using `optmatch` and
`designmatch` or another approach to stratification since this is the same way
that we will estimate effects after creating hundreds of matched pairs or sets
later.
We know how to analyze a block-randomized (or strata-randomized) experiment
[@gerber2012field]: each block is a mini-experiment. We *estimate the ATE
within each block* and *combine by weighting each block specific estimate*.
The *block-size weight* produces an unbiased estimator in randomized
experiments --- in an observational study we don't know about the bias since we
don't exactly know how to repeat the study. The *precision weight* (aka the
"fixed effects" weights) tends to produce smaller standard errors and
confidence intervals but is biased in randomized experiments. In standard
practice with matched designs we use the *precision weight* approach because
(1) we tend to have matched sets that are nearly the same size (i.e. pairs)
which means that the block-size weight and the precision-weight approaches tend
to be nearly the same and (2) we can calculate the precision-weighted results
quickly using the fact that "fixed effects" in linear regression models create
precision weights and (3) we know that our estimators are biased because we
neither observe all relevant confounders nor do we know the functional form
that relates them to the treatment.
```{r weighting}
## First collapse the data to the level of the strata and calculate weights
dat_sets <- dat %>%
group_by(sex) %>%
summarize(
nb = n(),
ateb = mean(postY[trtbin == 1]) - mean(postY[trtbin == 0]),
prob_trt = mean(trtbin), ## proportion treated in the strata
nbwt = n() / nrow(dat), ## proportion of the total data in the strata
prec_wt = nbwt * prob_trt * (1 - prob_trt),
)
dat_sets$prec_wt_norm <- with(dat_sets, prec_wt / sum(prec_wt))
## The two groups of people with the treatment estimated within each set along with some weights
dat_sets %>% select(sex, nb, ateb, prob_trt, nbwt, prec_wt_norm)
## The overall estimation using the two kinds of weights
est_ate_nw <- with(dat_sets, sum(ateb * nbwt))
est_ate_pw <- with(dat_sets, sum(ateb * prec_wt_norm))
c(block_size_wt = est_ate_nw, precision_wt = est_ate_pw)
```
Strata-level weights can also be represented at the individual level --- and
this allows us to use linear models (least squares) to produce block-weighted
estimates of the overall average causal effect after "holding constant" sex.
```{r make_strata_level_data}
## Now creating the weights at the individual level
dat <- dat %>%
group_by(sex) %>%
mutate(
nb = n(),
mb = sum(trtbin),
ateb = mean(postY[trtbin == 1]) - mean(postY[trtbin == 0]),
prob_trt = mean(trtbin),
nbwt = (trtbin / prob_trt) + (1 - trtbin) / (1 - prob_trt),
prec_wt = nbwt * prob_trt * (1 - prob_trt)
) %>%
ungroup()
## Two ways to use the block-size weight
est_ate1a <- difference_in_means(postY ~ trtbin, blocks = sex, data = dat)
est_ate1b <- lm_robust(postY ~ trtbin, weights = nbwt, data = dat)
est_ate1c <- lm(postY ~ trtbin, weights = nbwt, data = dat)
## Three other ways to use the precision or harmonic weight
est_ate2a <- lm_robust(postY ~ trtbin + sex, data = dat)
est_ate2b <- lm_robust(postY ~ trtbin, fixed_effects = ~sex, data = dat)
est_ate2c <- lm_robust(postY ~ trtbin, weights = prec_wt, data = dat)
```
Notice that we get the same result for the overall effect whether we first
calculate differences of means within strata and then take a weighted average
of those strata specific differences of means, or whether we hand off that job
to least squares. The only difference here is whether we choose to weight only
using the proportion of the data in the block, or whether we also want to take
into account the proportion taking the intervention within the strata.
```{r}
## Block-size weighted results
c(est_ate_nw, coef(est_ate1a)[["trtbin"]], coef(est_ate1b)[["trtbin"]], coef(est_ate1c)[["trtbin"]])
## Precision weighted results
c(est_ate_pw, coef(est_ate2a)[["trtbin"]], coef(est_ate2b)[["trtbin"]], coef(est_ate2c)[["trtbin"]])
```
## Summary of the section
So, we have seen that we can "control for" a single binary variable using
stratification. We didn't need to evaluate the stratification --- after all, we
just held sex perfectly constant within strata. If someone asked, "Does this
design remove sex as an alternative explanation?" we could answer simply, "Yes.
We only compare women with women and men with men." And we were able to produce
an estimate of an overall effect by calculating effects within strata and
combining them via weighting. And we showed that we can use OLS for this
purpose --- just as we would if we had a block-randomized experiment.
What should we do if we have more than one variable? Or if those variables have
more than a few values? We explain that below as we introduce optimal matching.
# Create a design, evaluate and iterate if needed
When we created our fake data we made two intervention variables, `trtbin` and
`trtcont`, because we want to demonstrate how to use two kinds of approaches to
stratification here: **bipartite matching** (which creates sets or pairs of people
who differ in a binary variable (i.e. comparing those who received the
intervention versus did not) and **non-bipartite matching** (which creates pairs of
people who differ in levels of a variable that has more than two values such
that each pair should have one person with a higher value and another person
with a lower value).
Imagine that we want to compare people who are similar in multiple demographic
characteristics as well as baseline outcome (`preY`). We would like to **create
a research design** where people are placed into strata where they are as
similar as possible with other people along all of those characteristics. If we
calculate effects within those strata, we know that we will have minimized the
effects of those characteristics on the outcome even if we have not exactly
eliminated them as we did above when we exactly controlled "male" versus
"female". In the last decades, algorithms to create strata that minimize
differences between people on those covariates have been implemented in
relatively user-friendly software (see the list in [@rosenbaum2020modern]). In
this guide we demonstrate the use of `optmatch` and `designmatch` because each
uses a slightly different approach to the minimization problem (both are
"optimal" because they solve problems by minimizing).
We demonstrate here how to do this by (1) combining the many variables into
single scores via *dimension reduction* and then using the optimal matching
algorithms to find strata that minimize the within-strata distances between
people on that score (say, using a Mahalanobis distance score and/or a
propensity score) and (2) by directly constraining the algorithm (say,
restricting research designs to only combine people with fewer than 10 years
difference in age).
## Bipartite Matching with Optmatch
We will start with the task of comparing people based on their experience of a binary intervention (`trtbin`) using the `optmatch` package.
The general idea in optimal matching is to answer the question, "How shall we
break the data into strata?" with "We should choose strata within which
differences between units are minimized." Current software like `optmatch` and
`designmatch` create these strata via minimization using **distance matrices**.
That is, if we can represent differences between all of the treated and control
observations as some form of distances, then we can ask those optimal matching
algorithms to find strata that are optimal in the sense of there being no other
sets of strata where treated and control units are closer together on that
distance. For example, below we make a matrix of absolute differences between
each treated unit and each control unit in `preY`:
```{r abs_dist}
## Scalar/Absolute differences in baseline outcome
abs_dist <- match_on(trtbin ~ preY, data = dat, method = "euclidean")
abs_dist[1:3, 1:4]
## Notice that the distance between the treated unit labeled "1" and the control unit labeled "5" in the matrix above is simply the absolute difference:
abs(dat["1", "preY"] - dat["5", "preY"])
## Optmatch offers us some overview of this distance matrix
summary(abs_dist)
```
## Dimension reduction
We convert many columns into one column we reduce the dimensions of the dataset
(to one column). We can use the idea of **multivariate distance** to produce
distance matrices to minimize **multivariate distances**.
We defer in-depth explanation of these scores to [@rosenbaum2020book] for now.
I already created the distance matrix recording absolute differences on a
single variable `abs_dist` which records the absolute difference in the
pre-outcome between people in the treated versus control groups. Below we
create multivariate distances in two different ways: a Mahalanobis distance
(`mh_dist`) and a standardized distance on propensity score (`ps_dist`). The
Mahalanobis distance matrix records the differences in Mahalanobis distance
(after transforming the covariates to ranks and scaling, see the
[@rosenbaum2020book] section 9.3 on this); and `ps_dist` which records
differences in predicted treatment or "propensity score" on the linear
predictor scale after standardizing.
```{r mh_dists}
### Setup for multivariate distances
covs <- c("sso", "age", "sex", "race", "edu", "grade", "pay", "tenure", "preY")
cov_fmla <- reformulate(covs, response = "trtbin")
## A multivariate distance score: Mahalanobis distance on rank transformed covariates
mh_dist <- match_on(cov_fmla, data = dat, method = "rank_mahalanobis")
mh_dist[1:3, 1:4]
## person 213 is closer to person 1 in multivariate terms than person 104
mh_dist[1, c("104", "213")]
## Here we can compare those two people with person 1 along all of the covariates:
dat %>%
filter(id %in% c(1, 104, 213)) %>%
select(one_of(c("trtbin", "id", covs)))
summary(mh_dist)
```
This next produces a different distance matrix. Whereas the Mahalanobis
distance aims to weight each covariate equally, the propensity distance weights
the influence of each covariate in the score depending on how predictive it is
of the treatment. We often use a logistic regression to produce these scores,
but logistic regressions often overfit the data, so below we use a version of
logistic regression with the coefficients shrunken towards zero. What really
matters here is that the covariates that are *relatively more predictive* than
others have larger coefficients in absolute value.
```{r ps_dist}
## A multivariate distance score: Propensity score using a logistic regression
## model that shrinks coefficients toward zero to avoid the separation problem
## common in logistic models
## Not loading the arm package because it contains lots of stuff that we don't need so just
## using arm::bayesglm
psmod <- arm::bayesglm(cov_fmla, data = dat, family = binomial())
## Inspect the coefs to make sure that none are too huge
zapsmall(coef(psmod))
ps_dist <- match_on(psmod, data = dat)
ps_dist[1:3, 1:4]
summary(ps_dist)
```
## Creating the stratification
Creating the stratification using the `pairmatch` and `fullmatch` commands in
the optmatch package just requires that the researcher provide an appropriately
formatted distance matrix to those functions.
Here is a paired design. The `pairmatch` command creates an object that is a
factor variable that indicates which individual belongs with which pair. Below
we see persons 1 and 3 (people who experienced the intervention) were placed
into pairs with person 627 and 383 (people who did not experience the
intervention). These people don't look extremely similar, so we may want to
fine tune this design using other information.
```{r pm1}
## A paired design
pm1 <- pairmatch(mh_dist, data = dat)
summary(pm1)
stopifnot(all.equal(names(pm1), row.names(dat)))
dat$pm1 <- factor(pm1)
dat %>%
filter(pm1 %in% c("1.1", "1.2")) %>%
select(one_of(c("pm1", "trtbin", "id", covs))) %>%
arrange(pm1, trtbin)
```
We can also ask for a match with a variable number of controls and treated
units in each set. Below we show a fully-matched design in which no control
units are excluded. We restrict this design to have no more than 1 treated unit
per set and no more than 5 controls per set (we can make other decisions to
generate a design that we like later: the decision to restrict the size of the
sets has mostly to do with the fact the effective sample size of a stratified
research design with two treatments is enhanced as we (a) get more strata and
(b) the strata are more equal in size of treated versus controls. We can
explain a lot more about this.)
```{r fm1}
## A full-matched design with no more than one treated unit per set and no more than 5 controls per set
fm1 <- fullmatch(mh_dist, min.controls = 1, max.controls = 5, data = dat)
summary(fm1, min.controls = 0, max.controls = Inf)
stopifnot(all.equal(names(fm1), row.names(dat)))
dat$fm1 <- factor(fm1)
dat %>%
filter(fm1 %in% c("1.1", "1.10")) %>%
select(one_of(c("fm1", "trtbin", "id", covs))) %>%
arrange(fm1, trtbin)
```
In each case the strata (pairs or matched sets) are named with levels like
"1.1" or "1.99" etc.. where these names are not numbers, but "1.1" means "group
1, first set". And in our case, here we only have one overall group so all of
the sets start with `1`: when we use exact matching below the first element of
the set name will change.
### How to assess a matched design?
Did either of those designs do a good job? Should we feel confident using them
to remove the influence of the covariates? We have two ways to answer those
questions.^[Another question we should ask about a research design is whether
it provides enough information to help us discern differences should those
differences exist. We set aside this question of statistical power or
information for later. From the perspective of stratified designs, statistical
power or information or "effective sample size" is a function of not only
number of total observations, but of the number of strata, and the variance of
both the treatment and the outcomes within strata. In the case of a binary
treatment, variance of the treatment, amounts to the proportion treated. In a
matched design that creates pairs, these considerations go away --- the number
of strata is maximized and each pair has exactly one treated and one control
unit. When creating full matched designs --- with varying numbers of treated
and controls per set --- these issues become more important.]
First, we can answer that by referring to what we know about the units and the
theory of change and the alternative explanations that we are trying to engage.
For example, imagine that we really must remove the effect of age from our
comparisons in order for any reasonable interpretation of our analysis to shed
light on the effect of the intervention itself and not on age. We might then
calculate the difference in age within each set and ask whether any set has
such a large difference in age that we could not credibly claim to have
"Controlled for" age with this design overall.
```{r echo=FALSE, out.width=".9\\textwidth"}
library(gridExtra)
dat_plot1 <- dat %>% select(age,trtbin,fm1) %>%
group_by(fm1) %>%
mutate(agediff=abs(mean(age[trtbin==1]) - mean(age[trtbin==0])),
nb=n()) %>%
ungroup() %>%
mutate(fm1 = factor(fm1, levels = names(sort(tapply(agediff, fm1, mean),decreasing=TRUE))))
dat_plot1$nostrata <- rep(1, nrow(dat_plot1))
```
For example, we could make a plot to show this information within sets versus
raw differences from the fullmatch. Here, for example, we show the worst 100
sets in order of remaining difference in mean age between the treated and
control observations from left to right. Notice that no set has nearly the age
difference that we see overall (in the boxplot at the far right): without
matching we would be comparing people who are `r min(dat$age)` with people who
are `r max(dat$age)` whereas after matching the worse set has has a difference
of `r max(dat_plot1$agediff)`.
```{r makeplot1}
bpfm1 <- ggplot(dat_plot1 %>% filter(agediff > quantile(agediff,.9)), aes(x = fm1, y = age)) +
geom_boxplot() +
stat_summary(fun = mean, geom = "point", shape = 20, size = 3, color = "red", fill = "red") +
geom_text(aes(x=fm1,y=min(age),label=nb))+
annotate("text",x = 0, y = min(dat_plot1$age), label="Set \n size")
bpfm1 <- bpfm1 + scale_x_discrete(expand = expand_scale(add = c(2, 0.5)))
bpfm1
bporig <- ggplot(dat_plot1, aes(x = nostrata, y = age)) +
geom_boxplot() +
stat_summary(
fun = mean, geom = "point",
shape = 20, size = 3, color = "red", fill = "red"
)
grid.arrange(bpfm1, bporig, ncol = 2, layout_matrix = matrix(c(1, 1, 1, 1, 2), nrow = 1))
```
We can futher reason about these differences by looking at them more closely.
Here we use the full matched design just for convenience. For example, we see
that the worst matched set has a difference of 33 years --- including a 15 year
old and a 49 year old. This might be a bad set depending on the context. Across
the sets, we see that half of them differ by less than 5.8 years of age, and
that 90% of the sets differ by less than about 13 years of age. Is this sign
that we have done enough to remove the effects of age from the impact
evaluation? That depends on context. Let us imagine for now that we are not
satified with this design --- that we'd like to have smaller differences on
age. But that we also are worried about some of the other variables. And, in
fact we'd like to evaluate the design as a whole. How might we do that?
```{r sdiffs, echo=FALSE}
rawmndiffs <- with(dat, mean(age[trtbin == 1]) - mean(age[trtbin == 0]))
setdiffsfm1 <- dat %>%
group_by(fm1) %>%
summarize(
mn_age_diffs =
mean(age[trtbin == 1]) - mean(age[trtbin == 0]),
mn_age= mean(age),
min_age= min(age),
max_age= max(age)
)
setdiffsfm1 %>% arrange(desc(abs(mn_age_diffs))) %>% head()
## summary(setdiffsfm1$mn_age_diffs)
quantile(abs(setdiffsfm1$mn_age_diffs), seq(0,1,.1))
```
## How to assess stratified research designs
We know that in **designs randomized within strata**:
- Randomization balances covariate distributions between treated and control
groups. It does not make them identical.
- We can repeat the known randomization to check the randomization procedure
(mostly useful if there is a long chain of communication between the random
number generator and the field). **We know how strata weighted mean
differences in covariates would vary in an experiment.**
- **Randomization does not imply exact equivalence. Large differences in
covariates easily arise in small experiments.**
This means that we can compare **stratified observational studies** to
equivalent randomized experiments [@hansen2008cbs].
## Evaluate the design: Compare to a randomized experiment.
The within-set differences look different from those that would be expected
from a randomized experiment.^[If you look at the documentation for
`balanceTest` you'll that it can also handle clustered data as well as other
options.]
```{r compare_bal}
#| eval: true
#| echo: true
balfmla_new <- stats::update(cov_fmla,.~.+ strata(fm1)+strata(pm1))
xbfm1 <- balanceTest(balfmla_new, data = dat, p.adjust.method="holm")
xbfm1$results[, , ]
xbfm1$overall
```
## What is balanceTest doing?
It compares the strata-weighted average of within-strata differences to that
which would be expected if we were to repeat an experiment with the same
stratified design and same covariate values (and `balanceTest` uses a large
sample Normal approximation to this distribution.)
```{r xbagain, echo=TRUE}
setmeanDiffs <- dat %>%
group_by(fm1) %>%
summarise(
diffage = mean(age[trtbin == 1]) - mean(age[trtbin == 0]),
nb = n(),
nTb = sum(trtbin),
nCb = sum(1 - trtbin),
hwt = (2 * (nCb * nTb) / (nTb + nCb))
)
setmeanDiffs
```
## What is balanceTest doing with multiple sets/blocks?
The test statistic is a weighted average of the set-specific differences (*same
approach as we would use to test the null in a block-randomized experiment*)
```{r wtmns, echo=TRUE}
## The descriptive mean difference using block-size weights
with(setmeanDiffs, sum(diffage * nTb / sum(nTb)))
## The mean diff used as the observed value in the testing
with(setmeanDiffs, sum(diffage * hwt / sum(hwt)))
## Compare to balanceTest output
xbfm1$results[, , "fm1"]
```
## Something new: Calipers
Maybe we would prefer to limit the worse matches?
```{r caliper_one_dim}
quantile(as.vector(abs_dist), seq(0, 1, .1))
fm2 <- fullmatch(abs_dist + caliper(abs_dist, .08), data = dat)
summary(fm2, min.controls = 0, max.controls = Inf)
balfmla_new2 <- stats::update(balfmla_new,.~.+ strata(fm2))
xb2 <- balanceTest(balfmla_new2, data = dat, p.adjust.method="holm")
xb2$overall
xb2$results[, , ]
```
### Tools to improve design
#### Calipers
The optmatch package allows calipers (which forbids certain pairs from being matched).^[You can implement penalties by hand.] Here, for example, we forbid comparisons which differ by more than 2 propensity score standardized distances.
```{r}
## First inspect the distance matrix itself: how are the distances distributed?
quantile(as.vector(ps_dist), seq(0, 1, .1))
## Next, apply a caliper (setting entries to Infinite)
ps_distCal <- ps_dist + caliper(ps_dist, 2)
as.matrix(ps_dist)[5:10, 5:10]
as.matrix(ps_distCal)[5:10, 5:10]
summary(ps_distCal)
```
## Calipers
The optmatch package allows calipers (which forbid certain pairs from being
matched).^[You can implement penalties by hand.] Here, for example, we forbid
comparisons which differ by more than 2 standard deviations on the propensity
score. (Notice that we also use the `propensity.model` option to `summary` here
to get a quick look at the balance test:)
```{r}
fmCal1 <- fullmatch(ps_dist + caliper(ps_dist, 2), data = dat, tol = .00001)
summary(fmCal1, min.controls = 0, max.controls = Inf, propensity.model = psmod)
pmCal1 <- pairmatch(ps_dist + caliper(ps_dist, 2), data = dat, remove.unmatchables = TRUE)
summary(pmCal1, propensity.model = psmod)
```
## Calipers
Another example: We may want to match on mahalanobis distance but disallow any
pairs with extreme propensity distance and/or extreme differences in baseline
homicide rates (here using many covariates all together).
```{r}
## Create an R formulate object from vectors of variable names
balfmla <- reformulate(c("tenure", "age"), response = "trtbin")
## Create a mahalanobis distance matrix (of rank transformed data)
mhdist <- match_on(balfmla, data = dat, method = "rank_mahalanobis")
## Now make a matrix recording absolute differences between neighborhoods in
## terms of baseline homicide rate
tmppreY <- dat$preY
names(tmppreY) <- rownames(dat)
abs_dist <- match_on(tmppreY, z = dat$trtbin, data = dat)
abs_dist[1:3, 1:3]
quantile(as.vector(abs_dist), seq(0, 1, .1))
quantile(as.vector(mhdist), seq(0, 1, .1))
## Now create a new distance matrix using two calipers:
distCal <- ps_dist + caliper(mhdist, 9) + caliper(abs_dist, 2)
as.matrix(distCal)[5:10, 5:10]
## Compare to:
as.matrix(mhdist)[5:10, 5:10]
```
## Calipers
Now, use this new matrix for the creation of stratified designs --- but possibly excluding some units (also showing here the `tol` argument. The version with the tighter tolerance produces a solution with smaller overall distances)
```{r}
fmCal2a <- fullmatch(distCal, data = dat, tol = .001)
summary(fmCal2a, min.controls = 0, max.controls = Inf)
fmCal2b <- fullmatch(distCal, data = dat, tol = .00001)
summary(fmCal2b, min.controls = 0, max.controls = Inf, propensity.model = psmod)
dat$fmCal2a <- fmCal2a
dat$fmCal2b <- fmCal2b
fmCal2a_dists <- matched.distances(fmCal2a, distCal)
fmCal2b_dists <- matched.distances(fmCal2b, distCal)
mean(unlist(fmCal2a_dists))
mean(unlist(fmCal2b_dists))
```
## Exact Matching
We often have covariates that are categorical/nominal and for which we really care about strong balance. One approach to solve this problem is match **exactly** on one or more of such covariates. If `fullmatch` or `match_on` is going slow, this is also an approach to speed things up.
```{r echo=FALSE}
dat$gradeLowHi <- ifelse(dat$grade %in% c(4,5,6), "hi", "lo")
```
```{r}
dist2 <- ps_dist + exactMatch(trtbin ~ gradeLowHi, data = dat)
summary(dist2)
## or mhdist <- match_on(balfmla,within=exactMatch(trtbin~gradeLowHi,data=dat),data=dat,method="rank_mahalanobis")
## or fmEx1 <- fullmatch(update(balfmla,.~.+strata(gradeLowHi)),data=dat,method="rank_mahalanobis")
fmEx1 <- fullmatch(dist2, data = dat, tol = .00001)
summary(fmEx1, min.controls = 0, max.controls = Inf, propensity.model = psmod)
print(fmEx1, grouped = T)
dat$fmEx1 <- fmEx1
```
## Exact Matching
```{r}
ftable(grade = dat$gradeLowHi, Trt = dat$trtbin, fmEx1, col.vars = c("grade", "Trt"))
```
**Exact Matching**
**Calipers**
**Missing data**
### Using designmatch
## Non-bipartite matching using designmatch
## Estimate effects
### After bipartite matching
### After non-bipartite matching
## Assess sensitivity to unobserved confounders
```{r}
library(senstrat)
library(sensemakr)
```
# References