-
Notifications
You must be signed in to change notification settings - Fork 0
/
ae-12-exam-3-review.qmd
216 lines (141 loc) · 4.76 KB
/
ae-12-exam-3-review.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
title: "AE 12: Exam 3 Review"
author: "Add your name here"
format: pdf
editor: visual
---
# Introduction
## Packages
```{r}
#| label: load-pkgs
#| message: false
library(tidyverse)
library(tidymodels)
library(knitr)
library(Stat2Data)
library(rms)
library(nnet)
```
## Data
As part of a study of the effects of predatory intertidal crab species on snail populations, researchers measured the mean closing forces and the propodus heights of the claws on several crabs of three species.
![](images/paste-7BA685C6.png){width="300"}
```{r}
#| warning: false
claws <- read_csv("data/claws.csv")
```
We will use the following variables:
- `force`: Closing force of claw (newtons)
- `height`: Propodus height (mm)
- `species`: Crab species - Cp(Cancer productus), Hn (Hemigrapsus nudus), Lb(Lophopanopeus bellus)
- `lb`: 1 if Lophopanopeus bellus species, 0 otherwise
- `hn`: 1 if Hemigrapsus nudus species, 0 otherwise
- `cp`: 1 if Cancer productus species, 0 otherwise
- `force_cent`: mean centered force
- `height_cent`: mean centered height
Before we get started, let's make the categorical and indicator variables factors.
```{r}
claws <- claws %>%
mutate(
species = as_factor(species),
lb = as_factor(lb),
hn = as_factor(hn),
cp = as_factor(cp)
)
```
\pagebreak
# Part 1
## Probabilities vs. odds vs. log-odds
Why we use log-odds as response variable: <https://sta210-s22.github.io/website/slides/lec-18.html#/do-teenagers-get-7-hours-of-sleep>
## Exercise 1
Fill in the blanks:
- Use log-odds to ...
- Use odds to ...
- Use probabilities to ...
## Exercise 2
Suppose we want to use force to determine whether or not a crab is from the Lophopanopeus bellus (Lb) species.
Why should we use a logistic regression model for this analysis?
## Exercise 3
We will use the mean-centered variables for force in the model.
The model output is below.
Write the equation of the model produced by R.
Don't forget to fill in the blanks for `…`.
```{r echo = F}
lb_fit_1 <- logistic_reg() %>%
set_engine("glm") %>%
fit(lb ~ force_cent, data = claws)
tidy(lb_fit_1, conf.int = TRUE) %>%
kable(digits = 3)
```
Let $\pi$ be...
$$\log\Big(\frac{\hat{\pi}}{1 - \hat{\pi}}\Big) = $$
## Exercise 4
Interpret the intercept in the context of the data.
## Exercise 5
Interpret the effect of force in the context of the data.
## Exercise 6
Now let's consider adding `height_cent` to the model.
Fit the model that includes `height_cent`.
Then use AIC to choose the model that best fits the data.
```{r}
lb_fit_2 <- logistic_reg() %>%
set_engine("glm") %>%
fit(lb ~ force_cent + height_cent, data = claws)
tidy(lb_fit_2, conf.int = TRUE)
```
## Exercise 7
What do the following mean in the context of this data.
Explain and calculate them.
- Sensitivity: ...
- Specificity: ...
- Negative predictive power: ...
\pagebreak
# Part 2
Let's extend the model to use force and height to predict the species (Hn, Cp, and Lb).
The model output is below:
```{r}
species_fit <- multinom_reg() %>%
set_engine("nnet") %>%
fit(species ~ force_cent + height_cent, data = claws)
species_fit <- repair_call(species_fit, data = claws)
tidy(species_fit, conf.int = TRUE)
```
## Exercise 8
Write the equation of the model.
$$\log\Big(\frac{\hat{\pi}_{Hn}}{\hat{\pi}_{Cp}}\Big) = $$
$$\log\Big(\frac{\hat{\pi}_{Lb}}{\hat{\pi}_{Cp}}\Big) = $$
## Exercise 9
- Interpret the intercept for the odds a crab is Hn vs. Cp species.
- Interpret the effect of force on the odds a crab is Lb vs. Cp species.
## Exercise 10
Interpret the effect of force on the odds a crab is in the Hn vs. Lb species.
CAUTION: We can write an interpretation based on the estimated coefficients; however, we can't make any inferential conclusions for this question based on the current model.
We would need to refit the model with Lb as the baseline category to do so.
\pagebreak
## Exercise 11
Conditions for multinomial logistic (and logistic models as well):
- Independence:
- Randomness:
- Linearity:
```{r}
#| layout-ncol: 2
emplogitplot1(lb ~ force, data = claws, ngroups = 10)
emplogitplot1(lb ~ height, data = claws, ngroups = 10)
```
```{r}
#| layout-ncol: 2
# add code here for other species here
```
```{r}
#| layout-ncol: 2
# add code here for other species here
```
\pagebreak
# Part 3
## Checking for multicollinearity in logistic and multinomial logistic
Similar to multiple linear regression, we can also check for multicollinearity in logistic and multinomial logistic models.
- Use the `vif` function to check for multicollinearity in logistic regression.
```{r}
```
- The `vif` function doesn't work for the multinomial logistic regression models, so we can look at a correlation matrix of the predictors as a way to assess if the predictors are highly correlated:
```{r}
```