This repository contains the computing bootcamp materials for incoming Ph.D. and M.S. students to the Department of Statistical Science at Duke University. These materials are adapted from those developed by Mine Çetinkaya-Rundel and Colin Rundel for the 2018 computing bootcamp.
To get access to the materials, click the green Code
button, and download a
zip file of this repository. In the version control video and slides
we'll go through the steps of forking and cloning this repository.
In slides/
you'll find three HTML/PDF slides on the topics below.
00_introduction_and_resources.html
/00_introduction_and_resources.pdf
01_r_version_control.html
/01_r_version_control.pdf
02_python.html
/02_python.pdf
In exercises/
you'll find one Rmd
(R Markdown) file and two ipynb
(python notebook) files.
At the start of each slide deck are links to companion videos hosted through Duke's Warpwire platform. The videos serve as supplementary material to the contents in the slides. It is recommended that you work through the slides and corresponding videos in the order presented above.
Solutions to exercises will be added to this repository later in the bootcamp.
- Duke computing resources and getting help
- Duke VPN
- Duke software
- Compute cluster
- DSS computing resources and getting help
- RStudio Pro servers
- Introduce git and GitHub
- Initiate a project directory, understand the git workflow
- Discuss the role of version control in reproducibility
- Discuss version control best practices
- Recognize the problems that reproducible research helps address, featuring a brief discussion of case studies gone wrong and how reproducible research could have possibly helped
- Identify pain points in getting your analysis to be reproducible
- The role of documentation, sharing, automation, and organization in making your research more reproducible
- Introduce some tools to solve these problems, specifically R / RStudio / R Markdown
- Organize projects and folders to enable reproducibility and reusability
- Understand the structure of data files and the importance of documenting all changes made
- Create a reproducible project workflow using R / RStudio / R Markdown
- Navigate R Markdown and RStudio
- Analyze data and create graphics with the package
tidyverse
- Discuss workflow
- Navigate Jupyter notebooks
- Introduce Python data structures, control flow, functions, and the basics of object oriented programming
- Discuss popular Python packages including
NumPy
,SciPy
,pandas
,matplotlib
,seaborn
, andscikit-learn
- Highlight similarities and differences between Python and R
See slides for references related to specific topics.