Skip to content

Latest commit

 

History

History
87 lines (63 loc) · 3.1 KB

README.md

File metadata and controls

87 lines (63 loc) · 3.1 KB

Duke University :: Department of Statistical Science Computing Bootcamp 2020

This repository contains the computing bootcamp materials for incoming Ph.D. and M.S. students to the Department of Statistical Science at Duke University. These materials are adapted from those developed by Mine Çetinkaya-Rundel and Colin Rundel for the 2018 computing bootcamp.

Getting started

Access materials

To get access to the materials, click the green Code button, and download a zip file of this repository. In the version control video and slides we'll go through the steps of forking and cloning this repository.

Contents

In slides/ you'll find three HTML/PDF slides on the topics below.

  1. 00_introduction_and_resources.html / 00_introduction_and_resources.pdf
  2. 01_r_version_control.html / 01_r_version_control.pdf
  3. 02_python.html / 02_python.pdf

In exercises/ you'll find one Rmd (R Markdown) file and two ipynb (python notebook) files.

At the start of each slide deck are links to companion videos hosted through Duke's Warpwire platform. The videos serve as supplementary material to the contents in the slides. It is recommended that you work through the slides and corresponding videos in the order presented above.

Solutions to exercises will be added to this repository later in the bootcamp.

Computing resources

  • Duke computing resources and getting help
    • Duke VPN
    • Duke software
    • Compute cluster
  • DSS computing resources and getting help
    • RStudio Pro servers

Version control and R

Version control

  • Introduce git and GitHub
  • Initiate a project directory, understand the git workflow
  • Discuss the role of version control in reproducibility
  • Discuss version control best practices

Introduction to reproducible research

  • Recognize the problems that reproducible research helps address, featuring a brief discussion of case studies gone wrong and how reproducible research could have possibly helped
  • Identify pain points in getting your analysis to be reproducible
  • The role of documentation, sharing, automation, and organization in making your research more reproducible
  • Introduce some tools to solve these problems, specifically R / RStudio / R Markdown

Organizing your project to facilitate reproducible research

  • Organize projects and folders to enable reproducibility and reusability
  • Understand the structure of data files and the importance of documenting all changes made
  • Create a reproducible project workflow using R / RStudio / R Markdown

R / RStudio and R Markdown

  • Navigate R Markdown and RStudio
  • Analyze data and create graphics with the package tidyverse
  • Discuss workflow

Python

  • Navigate Jupyter notebooks
  • Introduce Python data structures, control flow, functions, and the basics of object oriented programming
  • Discuss popular Python packages including NumPy, SciPy, pandas, matplotlib, seaborn, and scikit-learn
  • Highlight similarities and differences between Python and R

References

See slides for references related to specific topics.