After learning basic data science skills in Bioinformatics Applications I, students will gain a deeper understanding of statistics and machine learning. A deeper understanding will be gained of what can go wrong in data analyses, and principles of reproducible research will be emphasized. Analyses will be primarily performed in R, and data science skills will continue to be developed.
- Instructor: Randall Johnson, PhD
- Office Hours: In-person office hours will be held Thursdays immediately after class.
- Prerequisites: BIFX 552
- Textbook: Data Analysis for the Life Sciences
- Communications: All course communications will be posted on
Blackboard. In order to receive timely notifications, it is
recommended that you do one or more of the following:
- Check Blackboard often
- Set your Blackboard email notifications to alert you when something is posted
- Download the phone app and enable push notifications (this may not be the best option this term, as the app was just released and seems to be a little limited).
On completion of this course, students should be comfortable with the following:
- Basic R programming
- R package management
- Linear Regression
- Logistic Regression
- Some familiarity with other machine learning techniques
In addition to weekly reading assignments, Blackboard modules containing instructional vignettes will need to be viewed. These modules will be followed by a short quiz to guage class understanding prior to class. Students will be given a score for each quiz, but only participation will be tracked for the purpose of grading (i.e. if you complete both the module and the quiz, full points will be awarded for grading purposes).
Grades will be based on completion of homework, in-class participation, and two exams.
- Homework - 30%
- In-class participation - 30%
- Mid-term - 20%
- Final exam - 20%
In the event of severe weather resulting in the closure of Hood College and the cancellation of a regularly scheduled class, the material from the missed class will be posted on blackboard, and a live chat session will be held to work through material and answer questions.
Reading assignments are from Data Analysis for the Life Sciences unless otherwise specified, and they should be read prior to class. More details on reading assignments will be given on Blackboard.
Week | Topics | Reading | |
---|---|---|---|
1 | Jan 18 | Class intro R review |
|
Linear Regression | |||
2 | Jan 25 | Model building and bias | |
3 | Feb 1 | Regression Assumptions | |
4 | Feb 8 | Complex Interactions | |
5 | Feb 15 | Confidence Intervals and Tests of Association |
|
6 | Feb 22 | Missing Data Model Building Revisited |
|
7 | Mar 1 | Review | |
8 | Mar 8 | Mid Term Exam | |
Mar 15 | Spring Break! | ||
9 | Mar 22 | Generalized Liner Models | |
Logistic Regression | |||
10 | Mar 29 | Odds Ratios | |
11 | Apr 5 | Regression Assumptions | |
12 | Apr 12 | Power and Sample Size | |
Intro to Machine Learning | |||
13 | Apr 19 | Clustering and Population Structure |
|
14 | Apr 26 | Neural Networks Deep Learning |
|
15 | May 3 | Review | |
16 | May 10 | Final Exam |