This project provides an analysis of global suicide rates using data from 1990 to 2022. The analysis explores trends over time, comparisons between regions and countries, and the relationship between socio-economic factors and suicide rates. The project also deals with missing data, ensuring accurate and meaningful visualizations.
- Introduction
- Dataset
- Handling Missing Values
- Visualizations
- Setup and Installation
- How to Run the Project
- Usage
- Contributing
- License
The goal of this project is to provide insights into global suicide rates, identify trends, and examine the socio-economic factors that may influence these rates. The project also demonstrates how to handle missing values in the dataset and create effective visualizations for data analysis.
The dataset contains global suicide statistics and related socio-economic factors for various countries and regions from 1990 to 2022. The key columns in the dataset include:
RegionCode
,RegionName
,CountryCode
,CountryName
Year
,Sex
,AgeGroup
,Generation
SuicideCount
,CauseSpecificDeathPercentage
,DeathRatePer100K
Population
,GDP
,GDPPerCapita
,GrossNationalIncome
,GNIPerCapita
InflationRate
,EmploymentPopulationRatio
Missing values in the dataset were handled using various strategies such as forward fill, mode (for categorical variables), mean, and linear interpolation (for numerical time-series data).
To ensure the dataset was clean and ready for analysis, the following steps were taken:
- Forward Fill (
ffill
): Used forRegionCode
,CountryName
, andYear
, where missing values were filled with the previous entry. - Mode: Used for categorical columns like
Sex
,AgeGroup
, andGeneration
, where missing values were filled with the most frequent value (mode). - Mean: Applied to columns such as
CauseSpecificDeathPercentage
andDeathRatePer100K
to estimate missing values based on the average. - Interpolation: Used for numerical columns like
SuicideCount
,Population
,GDPPerCapita
, etc., to smoothly estimate missing values based on surrounding data points.
For detailed code on handling missing values, refer to the notebook.
The project includes a variety of visualizations that help illustrate trends and patterns in suicide rates across regions and time. These include:
- Bar Plot: Total suicide counts by country.
- Line Plot: Global suicide counts over time.
- Histogram: Distribution of suicide rates (per 100K population).
- Scatter Plot: Relationship between suicide rates and GDP per capita.
- Box Plot: Distribution of suicide rates by gender.
- Heatmap: Correlation between key variables like GDP, population, and suicide rates.
- Pie Chart: Proportion of suicides by age group.
- Stacked Bar Plot: Suicide counts by region and gender.
For detailed code on the visualizations, refer to the notebook.
To run the project locally, follow these steps:
- Python 3.7 or higher
- Jupyter Notebook (recommended for viewing the analysis)
- Libraries:
pandas
matplotlib
seaborn
numpy
- Clone the repository:
git clone https://github.com/yourusername/suicide-rate-analysis.git
- Navigate to the project directory:
cd suicide-rate-analysis
- Install the required Python packages:
pip install -r requirements.txt
- Launch Jupyter Notebook:
jupyter notebook
- Open the notebook file
suicide_analysis.ipynb
. - Run the notebook cells to process the data and generate visualizations.
Alternatively, you can run the Python script version of the analysis:
python suicide_analysis.py
This project can be used to:
- Study trends in global suicide rates across time.
- Explore the impact of socio-economic factors on suicide rates.
- Visualize the distribution of suicide rates across different regions, genders, and age groups.
- Develop predictive models based on the correlations between variables like GDP, population, and suicide rates.
Contributions are welcome! If you'd like to contribute to this project, please fork the repository and create a pull request with your changes. You can also open issues to report bugs or suggest new features.
- Fork this repository.
- Create a new branch with your feature or bug fix:
git checkout -b feature-name
- Commit your changes and push to your branch:
git push origin feature-name
- Open a pull request and describe the changes you’ve made.
This project is licensed under the MIT License. See the LICENSE file for more details.