The goal of this project is to get our hands "dirty" as a budding data scientist (and to practice certain materials taught in the class). The project helped us gain a much better appreciation for working with "data in the wild", a better understanding of what it means to work as a data scientist, a deeper understanding of the class materials, a deeper understanding of how to use and debug machine learning models, a chance to work with popular data science tools in Python, and a glimpse into some research efforts in data science.
Specifically, in this project, we collected data, "wrangled" the data by extracting/cleaning/matching/integrating the data into a single unified data set, then analyzed that data set to infer insights.
The project was done in four stages:
- Stage 1: Information extraction from natural text
- Stage 2: Crawling and extracting Structured data from Web Pages
- Stage 3: Entity Matching
- Stage 4: Integrating and Performing Analysis