Skip to content

Understand and Run Naive Bayes Algorithm on Dry Beans dataset

Notifications You must be signed in to change notification settings

havelhakimi/DryBeans

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Comparison of Naive Bayes Algorithm and Multinomial Logistic Regression on Dry Beans Dataset

This a solution notebook to an assignment question given in a Data Mining graduate course. Each code block is accompanied by relevant analysis wherever required.
Dataset link: https://archive.ics.uci.edu/ml/datasets/Dry+Bean+Dataset
Broadly, the following steps have been performed in this solution notebook:

  • Plotted the class distribution of the dataset and its analysis.
  • Performed EDA (histograms, box plots,etc.) and provided various insights on the data.
  • Used TSNE alogorithm to reduce data dimensions to 2 and plotted the resulting data as scatterplot.
    • This helps in observing the separability of the data.
  • Ran the sklearn implementation of Gaussian Naive Bayes and Multinomial Naive Bayes.
    • Reported Accuracy, Recall, and Precision and analyzed the differences in the two implementations of Naive Bayes using the [80:20] train test split
  • Used Principal Component Analysis (PCA) to reduce the number of features and used the reduced dataset for model training.
    • Retained dfifferent amounts of variance values, ranging from 0.9 to 1 in steps of 0.01.
    • Compared the results using Accuracy, Precision, Recall and F1-score.
  • Plotted ROC-AUC curves
  • Further trained the model using Multinomial Logistic Regression and compared the results with Naive Bayes.
These above assumptions and the flow of work is according to the questions asked in assignment.