-
Notifications
You must be signed in to change notification settings - Fork 4
/
homework06_old.Rmd
62 lines (45 loc) · 2.79 KB
/
homework06_old.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
title: "Homework 06"
---
## Homework 06 - due 11/22/2017
### Logistic Regression
For Homework 06, you will be using the HELP dataset, learn more at:
* [https://melindahiggins2000.github.io/N736Fall2017_HELPdataset/](https://melindahiggins2000.github.io/N736Fall2017_HELPdataset/) &
* [https://github.com/melindahiggins2000/N736Fall2017_HELPdataset](https://github.com/melindahiggins2000/N736Fall2017_HELPdataset)
Refer to the logistic regression analysis example and codes we ran during lesson 18 and 19 - see [https://github.com/melindahiggins2000/N736Fall2017_lesson1819](https://github.com/melindahiggins2000/N736Fall2017_lesson1819)
For the HELP dataset:
* OUTCOME VARIABLE: consider the variable `g1b` "Experienced serious thoughts of suicide (last 30 days) - Baseline"
* PREDICTOR VARIABLE: consider these variables as potential predictors for `g1b`:
- `age`, `female`, `pss_fr`, `homeless`, `pcs`, `mcs`, `cesd`, `indtot`
Complete the following:
1. Consider the continuous variable `cesd` as a predictor for `g1b`
a. run a logistic regression of the probability of suicidal thoughts (`g1b`) given their depressive symptoms scores (`cesd`)
b. make a plot of the the predicted probability of suicidal thoughts (`g1b`) by the depressive symptoms scores (`cesd`)
c. what value of the `cesd` leads to a probability of suicidal thoughts => 0.5? _(hint: use the plot you just made)_
2. Using variable selection methods, develop a logistic regression model for the probability of suicidal thoughts (`g1b`) considering all of these variables for possible inclusion: `age`, `female`, `pss_fr`, `homeless`, `pcs`, `mcs`, `cesd`, `indtot`
a. present the final model results
b. write a few sentences describing your results including:
i. model fit
ii. model classification table results - remember to report the threshold used for the classification table - you can change it from 0.5 if you think a different threshold might work better
iii. odds ratios for each significant predictor in the model
## Variables in HELP dataset to be used for Homework 06:
```{r, echo=FALSE, message=FALSE, warning=FALSE}
helpdata <- readRDS("helpmkh.rds")
library(tidyverse)
sub1 <- helpdata %>%
select(g1b, age, female, pss_fr,
homeless, pcs, mcs, cesd, indtot)
# create a function to get the label
# label output from the attributes() function
getlabel <- function(x) attributes(x)$label
# getlabel(sub1$age)
library(purrr)
ldf <- purrr::map_df(sub1, getlabel) # this is a 1x15 tibble data.frame
# t(ldf) # transpose for easier reading to a 15x1 single column list
# using knitr to get a table of these
# variable names for Rmarkdown
library(knitr)
knitr::kable(t(ldf),
col.names = c("Variable Label"),
caption="Use these variables from HELP dataset for Homework 06")
```