This class provides an introduction to applied data science skills needed by bioinformatics professionals. A focus will be placed on reproducible bioinformatics research and will include the following topics and tools: beginning to intermediate use of the Unix command line, working with remote computing resources, version tracking, R and Bioconductor, tools for manipulating sequence data, and creation of pipelines.
- Instructor: Randall Johnson, PhD
- Virtual office hours will be held at the following times:
- Mondays at 12 - 1 PM
- Wednesdays at 8 - 9 AM
- Thursdays immediately after the weekly live coding demo starting at 5:30
- Prerequisites: BIFX 503
- Textbooks: We will not be using a textbook this term, but you may find the following helpful. All are freely available online.
- Communications: All course communications will be posted on
Blackboard. In order to receive timely notifications, it is
recommended that you do one or more of the following:
- Check Blackboard often
- Set your Blackboard email notifications to alert you when something is posted
- Try the phone app
- Code of Conduct: We want to foster a safe, enjoyable and productive learning environment. People like you make our program a better. To meet this end, all participants will be expected to follow the course code of conduct located in the course documents on Blackboard.
On completion of this course, students should be comfortable with the following:
- Use of the Unix command line to manipulate data and perform bioinformatic analysis tasks
- Logging into and using remote computing resources
- Working with version controlled code repositories in a collaborative work environment
- Use of R and Bioconductor to perform bioinformatic analysis tasks
- Stitching a series of commands and/or programs together into a reusable pipeline
There will be homework assignments based on each of the weekly coding demos. This will give you an opportunity to practice what we cover each week. Homework and quizzes will typically be due on Wednesday evening.
Live coding demos will be held each week on Thursday evenings at 5:30 PM, and recordings will be posted on Blackboard for asynchronous viewing and review. Because Bioinformatics is such a fast-moving field, demos will be based on up-to-date, publicly available material from cutting edge experts from the Data Carpentry and Bioconductor communities.
Grades will be based on completion of homework, quizzes and two exams.
- Coding Demos - 30%
- Quizzes - 30%
- Mid-term - 20%
- Final exam - 20%
An R proficiency quiz will be administered during the first week of class. Depending on how well the class does on this quiz, we may update this schedule starting with week 8.
Week | Date | Topics |
---|---|---|
1 | Aug 20 | Class intro, Project Organization, Introducing the Shell |
2 | Aug 27 | Unix 1 (2 - 4) |
3 | Sep 3 | Unix 2 (5 - 6), Loop and Script Practice |
4 | Sep 10 | Regular Expressions Practice, Slurm, Data wrangling (1 - 3) |
5 | Sep 17 | Variant Calling Workflow |
6 | Sep 24 | Git (1 - 13) |
7 | Oct 1 | Exam |
8 | Oct 8 | R Intro (1-3) |
9 | Oct 15 | Subsetting Data, R Control Flow |
10 | Oct 22 | R Graphics, R Vectorization, R Functions |
11 | Oct 29 | R Advanced Data (11, 13-14) |
12 | Nov 5 | R Reports, R Software Tips |
13 | Nov 12 | Pipelines with Snakemake |
14 | Nov 19 | Revew / Final Exam |
15 | Nov 24 | Exam Due |