This project aims to identify the most popular Bangla news articles and explain the factors contributing to their popularity. The analysis focuses on data from a leading Bangla news website, Prothom Alo, and involves various data science techniques to uncover patterns and insights.
-
For Prothom Alo.ipynb: This Jupyter notebook contains the data collection and analysis process. It includes:
- Data scraping from Prothom Alo
- Data cleaning and preprocessing
- Exploratory Data Analysis (EDA)
- Popularity metrics calculation
- Visualization of findings
-
thesis.ipynb: This Jupyter notebook contains the detailed thesis write-up, including:
- Introduction to the problem
- Literature review
- Methodology
- Results and discussion
- Conclusion and future work
To run the notebooks, you need to have Python and Jupyter Notebook installed on your system. Follow these steps to set up the environment:
-
Clone the repository:
git clone https://github.com/adeeb0005/identifying-popular-bangla-news.git cd identifying-popular-bangla-news
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Start Jupyter Notebook:
jupyter notebook
-
Running the Analysis:
- Open
For Prothom Alo.ipynb
in Jupyter Notebook. - Execute the cells sequentially to perform data scraping, cleaning, and analysis.
- Open
-
Reading the Thesis:
- Open
thesis.ipynb
in Jupyter Notebook. - Follow the structured content to understand the project in detail.
- Open
The analysis reveals several key insights into the popularity of Bangla news articles, including:
- The types of news that attract the most readers.
- The influence of different factors (e.g., headline, publication time, category) on the popularity.
- Visualization of trends and patterns in the data.
This project provides a comprehensive understanding of what makes Bangla news popular. It leverages data science techniques to offer actionable insights that can help news organizations optimize their content strategy.
Future work could include:
- Extending the analysis to other Bangla news websites.
- Incorporating social media data to understand the broader impact.
- Using advanced machine learning models to predict article popularity.
For any queries or suggestions, please contact:
- Abdul Muhaimin Adeeb: [[email protected]]
- Affiliation: Session 2018-19, Dept. Of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj.