Can we use unsupervised learning and Natural Language Processing to determine if a news story is from the New York Times or the Onion? There are 2 datasets that will be used from Kaggle.
https://www.kaggle.com/datasets/undefinenull/satirical-news-from-the-onion https://www.kaggle.com/datasets/tmishinev/nyt-headlines-20102021 There are 40,051 news stories from the New York Times and 6789 from the Onion. A row in each dataset contains both the headline and text from the article. The target will be a column added to the combined dataset which will indicate whether the story is NYT or the Onion.
The plan is to follow the project workflow outlined in Canvas and use primarily python text processing libraries and tools as well as some visualization tools such as plotly and seaborn. The only other tool that I think might be used is MongoDB. The MVP should contain some preliminary analysis and/or a path forward to the end result of a predictive model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLP_Project_Proposal_Matt_Redmond.md

NLP_Project_Proposal_Matt_Redmond.md

Files

NLP_Project_Proposal_Matt_Redmond.md

Latest commit

History

NLP_Project_Proposal_Matt_Redmond.md

File metadata and controls