Comparison of Naive Bayes Algorithm and Multinomial Logistic Regression on Dry Beans Dataset

This a solution notebook to an assignment question given in a Data Mining graduate course. Each code block is accompanied by relevant analysis wherever required.
Dataset link: https://archive.ics.uci.edu/ml/datasets/Dry+Bean+Dataset
Broadly, the following steps have been performed in this solution notebook:

Plotted the class distribution of the dataset and its analysis.
Performed EDA (histograms, box plots,etc.) and provided various insights on the data.
Used TSNE alogorithm to reduce data dimensions to 2 and plotted the resulting data as scatterplot.

This helps in observing the separability of the data.

Ran the sklearn implementation of Gaussian Naive Bayes and Multinomial Naive Bayes.

Reported Accuracy, Recall, and Precision and analyzed the differences in the two implementations of Naive Bayes using the [80:20] train test split

Used Principal Component Analysis (PCA) to reduce the number of features and used the reduced dataset for model training.

Retained dfifferent amounts of variance values, ranging from 0.9 to 1 in steps of 0.01.
Compared the results using Accuracy, Precision, Recall and F1-score.

Plotted ROC-AUC curves
Further trained the model using Multinomial Logistic Regression and compared the results with Naive Bayes.

These above assumptions and the flow of work is according to the questions asked in assignment.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dry_Bean_Dataset.xlsx		Dry_Bean_Dataset.xlsx
EDA_GNB_PCA_LR_on_DryBeans.ipynb		EDA_GNB_PCA_LR_on_DryBeans.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparison of Naive Bayes Algorithm and Multinomial Logistic Regression on Dry Beans Dataset

About

Releases

Packages

Languages

havelhakimi/DryBeans

Folders and files

Latest commit

History

Repository files navigation

Comparison of Naive Bayes Algorithm and Multinomial Logistic Regression on Dry Beans Dataset

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages