“I Wish that I Could Be Like the Cool Kids”: an Analysis for Underexposed Indie Artist on Spotify
Sydney Hu (8560984456)
Stephanie Shaw (2344808735)
To run this project, you need to have Conda installed. If you don't have Conda, you can install Miniconda (a minimal installer for Conda) or Anaconda (which includes Conda and other useful data science tools).
-
Clone the Repository: First, clone this repository to your local machine using:
git clone https://github.com/USC-DSCI-510/final-project.git
-
Create and Activate a Virtual Environment:
conda create -n my_env python=3.10.12 -y
this command creates a conda virtual environment called
my_env
and the Python version is3.10
conda activate my_env
-
Install Required Packages:
pip install -r final-project/requirements.txt
This command installs all the packages listed in the
requirements.txt
file. -
Open Jupyter Notebook:
If Jupyter Notebook is not installed in the selected Conda environment, you can install it using:
conda install -n my_env jupyter
You will have to register your conda environment with Jupyter before it can be used in the jupyter notebooks. run the following command to register this conda environment with jupyter:
conda install ipykernel python -m ipykernel install --user --name my_env --display-name "my_env"
With the Conda environment activated, launch Jupyter Notebook from the terminal by typing:
jupyter notebook
This will open Jupyter in your default web browser. Navigate to the desired notebook in the
src
folder. Then please ensure that the correct Conda environment is selected in the Jupyter Notebook. Inside your Jupyter Notebook, go to Kernel -> Change Kernel and select the kernel associated with your Conda environmentNow, you are ready to run the project within this environment.
Due to the dynamic changes in the music industry, we chose to scrape data from dynamically generated pages. To get the most up-to-date collection of songs, run get_data.ipynb
.
- note: it might be different from the raw data stored in
group1_raw.csv
group2_raw.csv
, andgroup3_raw.csv
since our sources are dynamic.
For the following sections (Cleaning, Analysis, Visualization), please consider using files provided in the data folder in the case of inconsistent results.
to get consistent results as the report, please use files group1_raw.csv
group2_raw.csv
, and group3_raw.csv
in the data
folder and run clean_analyze_visualize.ipynb
.
Each section is clearly labeled in the file.
to clean data:
run each cell in the Cleaning
section.
to run analysis code:
run each cell in the Analysis
section.
consider using group1_processed.csv
and group2_processed.csv
in the unlikely event that results show inconsistency as the report
to create visualization:
run each cell in the Visualization
section. Please run the section after running Analysis
as the visualizations are based on the analyses.