Wine-Quality-Prediction

A Final Project for Purwadhika Data Scientist Course

This Repository is about exploring and predicting red wine quality. The dataset used in this project can be found here

The dataset consist of 11 input variables based on physicochmical analysis of red wines and 1 output variable.

The input variables are:

Fixed acidity (g tartaric acid/dm3) : Non volatile acid. Tartaric, malic, citric, and succinic are the most common to be found in wines
Volatile acidity (g acetic acid/dm3) : Volatile acid. Acetic acid is one of them, and is unwanted in wine due to unpleasant odors
citric acid (g/dm3) : One of the fixed acid in wine, commonly found in citrus. Has a refreshing taste
residual sugar (g/dm3) : The remaining sugar content in wine after fermentation. less than 1 g. liter is uncommon
chlorides (g sodium chloride/dm3) : indicator for the amount of salt in wine. Too high of chloride usually means salty taste than can deter consumers
free sulfur dioxide (mg/dm3) : Free form of SO2. SO2 concentration more that 50ppm can affect wine taste as a rotten egg smell is a signature of SO2 gas
total sulfur dioxide (mg/dm3) : The total amount of SO2, bound and free. A small amount of sulfur dioxide is important as antimicrobial agent in wine
density (g/dm3) : the density of the liquid. The density of water is close to one. Alcohol concentration, sugar content and otther dissolved compounds affect density
pH : The power of hydrogen, commonly used as acidity scale, with 7 as neutral, below 7 indicate acidity and above 7 indicate basicity. Most wines pH are between 3-4 on the pH scale
sulphates (g potassium sulphate/dm3): Sulfur based additive used as antimicrobial and anti-oxidant agent. Contribute to SO2 levels
alcohol (%) : The alcohol content in wine. It level depends of the initial sudar content before fermentation and the yeast used for fermentation.

The output variable is:
12. quality : Based on sensory test. The score is between 0 and 10.

The steps I took to do the predictions are:

Exploratory data analysis
Preprocessing. Which mainly about outliers handling
Model selections. The models used are all regression and regressor models
Deployment with flask.

Model Selection Results

The best model evaluation (least RMSE and highest R square) was actually with Random Forest Regressor, but Random Forest result cannot be extrapolated, therefore, I use Ridge regression instead, a model known for handling multicoliearity, which is prevalent in my data. The best data set was after dropping outliers, with initial RMSE 10.62% and R square 0.35.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
dashboard		dashboard
EDA_wine_quality.ipynb		EDA_wine_quality.ipynb
Model Selection.ipynb		Model Selection.ipynb
Preprocessing.ipynb		Preprocessing.ipynb
README.md		README.md
dataset.png		dataset.png
df_dropoutlier.csv		df_dropoutlier.csv
df_log.csv		df_log.csv
df_rbscal.csv		df_rbscal.csv
df_sqrt.csv		df_sqrt.csv
home.png		home.png
prediction.png		prediction.png
result.png		result.png
visualisation.png		visualisation.png
winequality.csv		winequality.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wine-Quality-Prediction

Model Selection Results

Dashboard

1. Home

2. Prediction

3. Result

4. Dataset

5. Visualisation

About

Releases

Packages

Languages

bgt90/Wine-Quality-Prediction

Folders and files

Latest commit

History

Repository files navigation

Wine-Quality-Prediction

Model Selection Results

Dashboard

1. Home

2. Prediction

3. Result

4. Dataset

5. Visualisation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages