This project is my final year project and I uploaded here hoping that this can help you. Please do not copy and paste straight away.
This project aims at developing a music mood classification system using audio features only by CNN. The overall accuracy of this system reached more than 0.90. There two classifiers, the first one is using 1D input data format and the second is 2D input data format. 1D classifer reached overall 0.91 accuracy and 2D classifier reached overall 0.96 accuracy. And finally, I did a simple testing user interface.
- Python
- Audio processing: Pydub, Ipython
- Feature extraction: Librosa
- Data processing: Numpy, Pandas, Sklearn
- Classification: Tensorflow
- Model Evaluation: Matplotlib
- Web development: Django
- R Language
- EDA: Tidyverse
- HTML/CSS/Javascript
- 900 ~30 second clips gathered from AllMusic API
- The files are organized in 4 folders (Q1 to Q4)
- Two metadata csv files with annotations and extra metadata (I didn't use this)
- A total of 225 music pieces are determined for each class in the database to have an equal number of samples in each class.
Source: http://mir.dei.uc.pt/downloads.html
If you use it, please cite the following article(s):
PDF Panda R., Malheiro R. & Paiva R. P. (2018). "Novel audio features for music emotion recognition". IEEE Transactions on Affective Computing (IEEE early access). DOI: 10.1109/TAFFC.2018.2820691.
PDFPanda R., Malheiro R., Paiva R. P. (2018). "Musical Texture and Expressivity Features for Music Emotion Recognition". 19th International Society for Music Information Retrieval Conference -- ISMIR 2018, Paris, France.
- 400 ~30 second from Turkish verbal and non-verbal music from diffrent genre
- The files are classified into 4 classes: Happy, Angry, Sad and Relaxed
- A total of 100 music pieces are determined for each class in the database to have an equal number of samples in each class
source: https://www.kaggle.com/datasets/blaler/turkish-music-emotion-dataset
If you use it, please cite the following article(s):
Bilal Er, M., & Aydilek, I. B. (2019). Music emotion recognition by using chroma spectrogram and deep visual features. Journal of Computational Intelligent Systems, 12(2), 1622–1634. International Journal of Computational Intelligence Systems, DOI: https://doi.org/10.2991/ijcis.d.191216.001
We first check the length of all the audio, remove all the audios less than 25.5s and cut all the audio clips to 25.5s
For 1D data input, I extracted tempo, key, scale, mean and variance of mfcc, root mean square, mel-spectrogram, tonnetz, chroma features, zero crossing rate, spectral roll-off, spectral centroid overall 57 input features and store them in an excel file.
For 2D data input, I extracted mel, mfcc, spectrogram only, as a few features can achieve high accuracy for 2D classifier. And stored them in .npz file.
Related file:
Before training, we need to preprocess the data. We need to split features and labels, train/val/test set split, features and labels encoding in numeric way(1D features), resize the features(2D features), features scaling for each set and finally reshaped the data which fit the classifiers.
I designed 5 layers and 4 layers CNN, consisting of 3(2) conv1D and maxpooling layers and 2 fully connected layers. Dataset1 made the model overfit, and I tried following methods to tackle the problems:
- Add regularizer
- Add dropout
- Reduce number of features
- Simplify the network They all didn't work. So, I changed dataset to dataset2. The the performance of model trained by Dataset2 improved a lot. (82% accuracy)
Related file: MMC Classification 1D
I designed 5 layers CNN, consisting of 3 conv2D and maxpooling layers and 2 fully connected layers. Dataset1 still made the model overfit but better than using 1D classifier. I didn't try to solve the overfit problem here, instead, I changed to dataset2 straight away. I used spectrogram, mfccs and mel-spectrogram to train the model respectively and tried different resizing size. If we resized to small size like 64 x 64, the model will suffer from unfitting problems, as infomation lost a lot, but if it's too large, like 1000 x 1000 it will be very time consuming and requires large storage. So after different tries, I used 300 x 300 for spectrogram, 600 x 120 for mfcc and 400 x 300 for mel-spectrogram. Finally I emsmebled three models by voting for majority. The model achieved 80% overall accuracy.
Related file: MMC Classification 2D
In order to find the reason why the dataset2 is better than dataset1, I used Boxplot and Histogram in R to show the different mean distribution of 1D features for different mood classes. And also I visualized the audio features using python Matplotlib. The EDA results shows that the value distribution of features for each mood class in dataset2 are distinct. So that's the reason why dataset2 is better. Related file:
Although the results using dataset2 reached 80% accuracy which meet my expectation. But there still some problems with this dataset. The training process for both 1D and 2D classifers actually is very unstable, and the difference between validation accuracy and testing accuracy is large (eg. val_acc: 0.78 test_acc: 0.90), which may lead to the less robust of the system. Below are one example of it. And this is due to the insufficient data in dataset.
So, I did audio data augmentation using two techniques: adding noise and time shifting. And my dataset was enlarged from 400 to 1200. And we put all the augmented data into training and validation set. The results become better. Which leads to stable training process, higher model accuracy and less difference between val_acc and test_acc. Below is one example of training accuracy:
The model using augmented data reached 94% model accuracy (1D classifier) and 93% accuracy (2D classifier). Related file:
- MMC Audio Augmentation
- MMC Classification with Augmented Data 1D
- MMC Classification with Augmented Data 2D
For testing the model, you can enter any song you like and model_num (1 or 2) to decide which classifier you want to use, 1 for 1D classifer and 2 for 2D classifier. Run all relate code and type in filename and model num to get the predicted mood. Noted, the song must be downloaded and stored in the folder SampleAudio under templates of mmc folder.
Related file: MMC Model Testing
If you want to a more easily used and aesthetic testing interface, you first change the directory to mmc (depends on where you store in your computer) and enter python manage.py runserver in terminal to launch the website. Noted, the song must be downloaded and stored in the folder SampleAudio under templates of mmc folder. And you must upload song from SampleAudio Folder.
Related folder: mmc