This repository has been archived by the owner on Sep 15, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
R4SME 06 case-control.Rmd
297 lines (218 loc) · 7.11 KB
/
R4SME 06 case-control.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
---
title: "R for SME 6: Analysis of an unmatched case-control study"
author: Andrea Mazzella [link](https://github.com/andreamazzella)
output: html_notebook
---
# Part 1: Introduction
## A) Basics
Install these packages if you haven't already
```{r}
install.packages("epiDisplay")
install.packages("pubh")
install.packages("statix")
```
Load packages
- Did you know? library() and require() are equivalent.
```{r}
require(haven)
library(magrittr) # %$% pipe
require(epiDisplay) # Epi functions
library(pubh) # chi-for-trend
require(rstatix) # test-for-trend
library(tidyverse) # %>% pipe, data management...
```
## B) Data exploration & management
Make sure you have the mortality.dta dataset in the same folder as this .rmd, and load it. It contains data on HIV infection among women in Mwanza, Tanzania.
```{r}
# Import the dataset
mwanza <- read_dta("./mwanza.dta")
```
Rather unhelpfully, the values are not labelled, and the Stata metadata file format, .hlp, can only be read in Stata or in Windows 7.
*Thanks, Stata!*
- How many observations does the dataset have?
- How many variables and what do these variables code?
- Which type are these variables?
```{r}
```
Solution:
```{r}
#Familiarise yourself with the data
View(mwanza)
glimpse(mwanza)
summary(mwanza)
```
1. Recategorise the variables "ed" and "age1" into two new variables called ed2 and age2 so that:
- ed2 is binary (1 = none; 2 = any formal education)
- age2 is grouped this way: 1=15-24, 2=25-34, 3=35+ years.
```{r}
# Tabulate all possible values of ed and age1
# Recategorise and label education level
# Recategorise and label age
# Check it worked ok
```
Possible solution:
```{r}
# Tabulate all possible values of ed and age1
mwanza %$% table(ed)
mwanza %$% table(age1)
# Recategorise and label education level
mwanza %<>%
mutate(ed2 = as.factor(
case_when(
ed == 1 ~ "none",
ed == 2 ~ "any formal",
ed == 3 ~ "any formal",
ed == 4 ~ "any formal"
)
))
# Recategorise and label age
mwanza %<>%
mutate(age2 = as.factor(
case_when(
age1 <= 2 ~ "15-24",
age1 == 3 | age1 == 4 ~ "25-34",
age1 == 5 | age1 == 6 ~ "35+"
)
))
# Check it worked ok
mwanza %$% table(ed,ed2)
mwanza %$% table(age1,age2)
```
Also, make "case" a factor.
```{r}
# Make "case" a factor and label it
mwanza %<>%
mutate(case = factor(case,
levels = c(0, 1),
labels = c("Control", "HIV_case")
))
summary(mwanza$case)
```
## C) Data analysis
2. Obtain the crude OR for education as a risk factor for HIV. The commands are from {epiDisplay}: tabpct() and cc().
- Note that cc() also calculates the Fisher's exact test automatically, unlike Stata.
```{r}
# 2x2 table with row percentages
# 2x2 table with crude OR
```
Possible solution:
```{r}
# 2x2 table with row percentages
mwanza %$% tabpct(case, ed2, percent= "row", graph = F)
# 2x2 table with crude OR
mwanza %$% cc(case, ed2, graph = F)
```
3. Assess whether age is a confounder or an effect modifier in the association between education and HIV.
- Obtain tables of HIV/education stratified by age
- Estimate ORs of HIV/education by different age groups (epiDisplay::mhor() - you need to specify the package because {pubh} also has a function called mhor)
- What is the Mantel-Haenszel summary estimate of the OR?
- What is the interpretation of the test for interaction?
```{r}
# Obtain tables of HIV/education stratified by age
# Estimate ORs of HIV/education by different age groups
```
Possible solution:
```{r}
# Obtain tables of HIV/education stratified by age
print("15-24 years stratum")
mwanza %>%
filter(age2 == "15-24") %$%
tabpct(case, ed2, percent= "row", graph = F)
print("25-34 years stratum")
mwanza %>%
filter(age2 == "25-34") %$%
tabpct(case, ed2, percent= "row", graph = F)
print("35+ years stratum")
mwanza %>%
filter(age2 == "35+") %$%
tabpct(case, ed2, percent= "row", graph = F)
# Estimate ORs of HIV/education by different age groups
mwanza %$% epiDisplay::mhor(case, ed2, age2, graph = F)
```
4. Assess whether religion is a confounder or an effect modifier between education and HIV infection.
- The "rel" variable is coded as such: 1 = Muslim, 2 = Catholic, 3 = Protestant, 4 = Other, 9 = missing value.
```{r}
# Recode missing values (9) as NA
# Exploratory tabulation
# Obtain tables of HIV/education stratified by religion
# Estimate ORs of HIV/education by different religions
```
Possible solution:
```{r}
# Recode missing values (9) as NA
summary(mwanza$rel)
mwanza %<>%
mutate(rel = na_if(rel, 9))
summary(mwanza$rel)
# Exploratory tabulation
mwanza %$% tabpct(case, rel, percent= "row", graph = F)
mwanza %$% tabpct(ed2, rel, percent= "row", graph = F)
# Obtain tables of HIV/education stratified by religion
print("Muslim")
mwanza %>%
filter(rel == "1") %$%
tabpct(case, ed2, percent= "row", graph = F)
print("Catholic")
mwanza %>%
filter(rel == "2") %$%
tabpct(case, ed2, percent= "row", graph = F)
print("Protestant")
mwanza %>%
filter(rel == "3") %$%
tabpct(case, ed2, percent= "row", graph = F)
print("Other")
mwanza %>%
filter(rel == "4") %$%
tabpct(case, ed2, percent= "row", graph = F)
# Estimate ORs of HIV/education by different religions
mwanza %$% epiDisplay::mhor(case, ed2, rel, graph = F)
```
5. Dealing with missing values for a potential confounder (npa)
The variable npa contains information on the number of sexual partners. It's coded: 1 (0-1), 2 (2-4), 3 (5-9), 4 (10-19), 9 (missing value).
- Tell R which values are missing. Observations with missing values will automatically be excluded from this analysis.
```{r}
```
Possible solution:
```{r}
mwanza %$% table(npa)
# Replace rel "9" with "NA"
mwanza$npa <- na_if(mwanza$npa, 9)
# 2x2 table with crude OR
mwanza %$% cc(case, ed2, graph = F)
# Estimate ORs of HIV/education by different age groups
mwanza %$% epiDisplay::mhor(case, ed2, npa, graph = F)
```
6. Exploring a dose-response relationship
- Create a new variable, npa2, with values of 0, 3, 7, and 15 instead with the original values. These correspond to an average of partners in each group.
```{r}
```
Possible solution:
```{r}
# Duplicate the npa variable and change its values
mwanza %<>%
mutate(npa2 = npa) %>%
mutate(npa2 = recode(
npa2,
`1` = 0,
`2` = 3,
`3` = 7,
`4` = 15
))
mwanza %$% table(npa, npa2)
```
- Perform a chi-squared test for trend of odds for the exposure npa2 and outcome.
```{r}
# Odds ratio for each partner-number group compared to those with 0/1 partner.
odds_trend(case ~ npa2, data = mwanza)
# Test for trend - {statix}
case_by_npa2 <- mwanza %$% table(case, npa2)
prop_trend_test(case_by_npa2)
```
- Perform a test for departure from trend for npa2.
```{r}
# Calculate difference between chi2 and trend test
departure_chi <-
chisq_test(case_by_npa2)$statistic - prop_trend_test(case_by_npa2)$statistic
# Test for departure from linear trend
pchisq(departure_chi, 2)
```