Skip to content

Latest commit

 

History

History
20 lines (6 loc) · 987 Bytes

NLP_Project_Proposal_Matt_Redmond.md

File metadata and controls

20 lines (6 loc) · 987 Bytes

Can we use unsupervised learning and Natural Language Processing to determine if a news story is from the New York Times or the Onion? There are 2 datasets that will be used from Kaggle.
https://www.kaggle.com/datasets/undefinenull/satirical-news-from-the-onion https://www.kaggle.com/datasets/tmishinev/nyt-headlines-20102021 There are 40,051 news stories from the New York Times and 6789 from the Onion. A row in each dataset contains both the headline and text from the article. The target will be a column added to the combined dataset which will indicate whether the story is NYT or the Onion.
The plan is to follow the project workflow outlined in Canvas and use primarily python text processing libraries and tools as well as some visualization tools such as plotly and seaborn. The only other tool that I think might be used is MongoDB. The MVP should contain some preliminary analysis and/or a path forward to the end result of a predictive model.