This repository contains code for analyzing customer segmentation using K-means clustering on Mall Customers dataset.
customer_segmentation.ipynb
: Jupyter notebook containing the analysis code.Mall_Customers.csv
: Dataset used for analysis.
The analysis includes data cleaning, exploratory data analysis (EDA), and clustering of customers based on their demographic and spending behavior.
- Renamed column 'Genre' to 'Gender'.
- Dropped 'CustomerID' column as it was not necessary for analysis.
- Plotted histograms for Age, Annual Income, and Spending Score.
- Used seaborn's
histplot
withkde=True
for density plot overlay.
- Count plot of Gender distribution among customers.
- Bar plot showing the number of customers in different age groups.
- Bar plot showing the number of customers in different spending score ranges.
- Applied K-means clustering to group customers based on Age and Spending Score.
- Plotted clusters on a scatter plot.
- Applied K-means clustering to group customers based on Annual Income and Spending Score.
- Plotted clusters on a scatter plot.
- Visualized clusters of customers in 3D space using Age, Annual Income, and Spending Score.
- Python 3.x
- Libraries: numpy, pandas, matplotlib, seaborn, scikit-learn
- Clone the repository:
git clone https://github.com/your-username/customer-segmentation.git cd customer-segmentation
2.Install required libraries:
pip install -r requirements.txt
- Run the Jupyter notebook customer_segmentation.ipynb to see the analysis.
The clustering results show distinct groups of customers based on their spending behavior and demographic characteristics.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.