Skip to content
forked from justmarkham/DAT8

General Assembly's Data Science course in Washington, DC

Notifications You must be signed in to change notification settings

victorozoh/DAT8

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAT8 Course Repository

Course materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15).

Instructor: Kevin Markham

Tuesday Thursday
8/18: Introduction to Data Science 8/20: Command Line and Version Control
8/25: Data Reading and Cleaning 8/27: Exploratory Data Analysis
9/1: Visualization
Project Discussion Deadline
9/3: Machine Learning
Project Question and Dataset Due
9/8: Getting Data 9/10: K-Nearest Neighbors
9/15: Basic Model Evaluation 9/17: Linear Regression
9/22: First Project Presentation 9/24: Logistic Regression
9/29: Advanced Model Evaluation 10/1: Naive Bayes and Text Data
10/6: Natural Language Processing 10/8: Kaggle Competition, Draft Paper Due
10/13: Decision Trees 10/15: Ensembling
10/20: Regularization and
Clustering, Peer Review Due
10/22: Course Review and Bonus Topics
10/27: Bonus Topics and
Final Project Presentation
10/29: Final Project Presentation

Before the Course Begins

  • Install Git.
  • Create an account on the GitHub website.
    • It is not necessary to download "GitHub for Windows" or "GitHub for Mac"
  • Install the Anaconda distribution of Python 2.7x.
    • If you choose not to use Anaconda, here is a list of the Python packages you will need to install during the course.
  • We would like to check the setup of your laptop before the course begins:
    • You can have your laptop checked before the intermediate Python workshop on Tuesday 8/11 (5:30pm-6:30pm), at the 15th & K Starbucks on Saturday 8/15 (1pm-3pm), or before class on Tuesday 8/18 (5:30pm-6:30pm).
    • Alternatively, you can walk through the setup checklist yourself.
  • Once you receive an email invitation from Slack, join our "DAT8 team" and add your photo.
  • Practice Python using the resources below.

Python Resources

Submission Forms


Class 1: Introduction to Data Science

Homework:

  • Work through GA's friendly command line tutorial using Terminal (Linux/Mac) or Git Bash (Windows).
  • Read through this command line reference, and complete the pre-class exercise at the bottom. (There's nothing you need to submit once you're done.)
  • Watch videos 1 through 8 (21 minutes) of Introduction to Git and GitHub, or read sections 1.1 through 2.2 of Pro Git.
  • If your laptop has any setup issues, please work with us to resolve them by Thursday. If your laptop has not yet been checked, you should come early on Thursday, or just walk through the setup checklist yourself (and let us know you have done so).

Resources:


Class 2: Command Line and Version Control

  • Slack tour
  • Review the command line pre-class exercise (code)
  • Git and GitHub (slides)
  • Intermediate command line

Homework:

Git and Markdown Resources:

  • Pro Git is an excellent book for learning Git. Read the first two chapters to gain a deeper understanding of version control and basic commands.
  • If you want to practice a lot of Git (and learn many more commands), Git Immersion looks promising.
  • If you want to understand how to contribute on GitHub, you first have to understand forks and pull requests.
  • GitRef is my favorite reference guide for Git commands, and Git quick reference for beginners is a shorter guide with commands grouped by workflow.
  • Cracking the Code to GitHub's Growth explains why GitHub is so popular among developers.
  • Markdown Cheatsheet provides a thorough set of Markdown examples with concise explanations. GitHub's Mastering Markdown is a simpler and more attractive guide, but is less comprehensive.

Command Line Resources:

  • If you want to go much deeper into the command line, Data Science at the Command Line is a great book. The companion website provides installation instructions for a "data science toolbox" (a virtual machine with many more command line tools), as well as a long reference guide to popular command line tools.
  • If you want to do more at the command line with CSV files, try out csvkit, which can be installed via pip.

Class 3: Data Reading and Cleaning

  • Git and GitHub assorted tips (slides)
  • Review command line homework (solution)
  • Python:
    • Spyder interface
    • Looping exercise
    • Lesson on file reading with airline safety data (code, data, article)
    • Data cleaning exercise
    • Walkthrough of Python homework with Chipotle data (code, data, article)

Homework:

  • Complete the Python homework assignment with the Chipotle data, add a commented Python script to your GitHub repo, and submit a link using the homework submission form. You have until Tuesday (9/1) to complete this assignment.

Resources:


Class 4: Exploratory Data Analysis

Homework:

  • Complete "Exercise Three" from today's Pandas script. Note: You do not need to submit this assignment.
  • Read How Software in Half of NYC Cabs Generates $5.2 Million a Year in Extra Tips for an excellent example of exploratory data analysis.
  • The deadline for discussing your project ideas with an instructor is Tuesday (9/1), and your project question write-up is due Thursday (9/3).

Resources:

About

General Assembly's Data Science course in Washington, DC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%