CSV Data Explorer with Streamlit
Overview
This project is a Streamlit-based web application that allows users to upload CSV files and explore them interactively. The app provides various features such as:
Data Overview: Displays the first few rows of the uploaded dataset. Basic Statistics: Computes and displays basic statistics like mean, median, and standard deviation for each column. Data Visualization: Generates pairplots and heatmaps for numerical columns in the dataset. Column Statistics: Allows users to select a column and view statistics (mean, median, standard deviation). This tool is ideal for quick exploration and analysis of CSV data without the need for complex software or setups.
Tech Stack
This project uses the following technologies:
Streamlit: An open-source app framework in Python for building web apps for data science and machine learning. Pandas: A powerful data manipulation library for Python, which is used to load, manipulate, and process CSV data. NumPy: A library for numerical computations, which is used for handling numerical data in the CSV files. Matplotlib: A plotting library for creating visualizations, including graphs and charts. Seaborn: A statistical data visualization library based on Matplotlib, which is used for creating attractive and informative visualizations like pairplots and heatmaps. pytest: A testing framework for Python that is used to test the application and ensure the reliability of the data processing and statistics functions. GitHub Actions: For continuous integration (CI) to automate the testing and deployment process. Python: The primary programming language for this project (version 3.8 or higher). Installation To run this project locally, follow the instructions below.
Prerequisites Python 3.8+: Ensure you have Python installed on your system. You can download it from the official Python website. Git: If you want to clone the repository and manage the project with Git. Steps to Set Up Clone the repository:
Open a terminal and run the following command to clone the repository:
git clone https://github.com/your-username/your-repository-name.git cd your-repository-name Create a virtual environment (optional but recommended):
You can create a virtual environment using conda or venv. Here’s how to do it with conda:
conda create --name myenv python=3.8 conda activate myenv Or, using venv:
python -m venv myenv
source myenv/bin/activate # On Windows use myenv\Scripts\activate
Install dependencies:
After activating your virtual environment, run: pip install -r requirements.txt This will install all the required libraries for the project.
Run the Streamlit app:
Once all dependencies are installed, you can start the app by running:
streamlit run app.py This will launch the Streamlit app locally. You can view it in your web browser at http://localhost:8501.
Usage Once the app is running:
Upload a CSV file:
Click on the Upload a CSV file button to upload your CSV dataset. The app will display the first few rows of the data for a quick preview. Data Overview:
The app will automatically show the first few rows and basic statistics (mean, median, standard deviation) for numerical columns. Data Visualization:
If the dataset contains numerical columns, the app will generate a pairplot of the numerical columns. It will also show a correlation heatmap for the numerical columns. Column Statistics:
You can select any column from the dropdown and view its detailed statistics, including mean, median, and standard deviation. Testing This project uses pytest for testing the data processing functions. To run the tests, follow these steps:
Ensure that you have installed all dependencies (including pytest).
Run the tests by executing:
pytest
This will run the tests in the test.py file and show the results in the terminal.
GitHub Actions (CI/CD) This project includes a GitHub Actions pipeline for continuous integration (CI). When you push changes to the repository, the workflow will automatically run to:
Set up the Python environment. Install dependencies. Run tests to ensure the integrity of the code. CI Workflow The workflow file is located at .github/workflows/python-app.yml. It triggers on push and pull_request events to the main branch and performs the following steps:
Set up Python 3.8: Uses the actions/setup-python action to configure the Python version. Install dependencies: Installs dependencies from the requirements.txt file. Run tests: Executes the tests using pytest. Deployment (Optional) You can deploy this app to a cloud platform or a service like Streamlit Cloud, Heroku, AWS, or Google Cloud.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
Special thanks to the creators of Streamlit, Pandas, Matplotlib, and Seaborn for building amazing libraries that made this project possible.