Web-App-Titanic-Survival

Come and check your chances of surviving the titanic shipwreck in this web app

Project Overview

Predicting if you will survive the titanic or not

Created an accurate model that can predict the probability of you surviving or not the shipwreck
Enter your passenger details in the web app and find out
Predicts correctly with an 84% accuracy
Feature engineered the titles from the passenger names
Feature engineered if the person is alone or not from the number of relatives present on the ship
Cleaned the data, normalized and scaled it appropriately
Optimized Naive bayes, Logistic regression, decision tree, k nearest neighbors, random forest, support vector machine, xtreme gradient boosting using ensembling methods to reach the best model.
Finally a soft voting ensembling classifier achieved the best accuracy.

Code and Resources Used

Python Version: 3.10.5
Packages: pandas, numpy, sklearn, requests, dill, Flask, xgboost, gunicorn, matplotlib, seaborn

Data from the passengers:

Variable	Definition	Key
survival	Survival	0 = No, 1 = Yes
pclass	Ticket class	1 = 1st, 2 = 2nd, 3 = 3rd
sex	Sex
Age	Age in years
sibsp	# of siblings / spouses aboard the Titanic
parch	# of parents / children aboard the Titanic
ticket	Ticket number
fare	Passenger fare
cabin	Cabin number
embarked	Port of Embarkation	C = Cherbourg, Q = Queenstown, S = Southampton

EDA

After getting the data, I explored it and looked for correlations:

Plotted the relationship between the features and the target variable
Compared various features to one another
Determined whether or not features are unbalanced. Checked if the target's classes are unbalanced
Calculated correlations between the various columns

Feature engineering and cleaning

The steps I took in this phase:

Drop out PassengerId(irrelevant), Name(feature engineered), Ticket(irrelevant) and Cabin(irrelevant and too many NaNs).
Create a new feature 'IsAlone'( = SibSp + Parch + 1) to indicate if a passenger is alone.
Extract titles from Name.
Imputation of missing values and normalization of numerical features
Encode the categorical features.

Model Building

I just used 'Age', 'Fare', 'Pclass', 'Sex', 'Embarked', 'IsAlone', and 'Title' because according to EDA they were the most relevant.
Created a numerical pipeline, then a categorical pipeline, and then united them.

Then I passed the features through the pipeline
Then I applied several machine learning models to the data and computed their cross validation scores on a validation set

I plotted the learning curves for all of those models to see who would overfit or underfit
I finally settled on an SVM, tuned it using gridsearchcv and computed its accuracy

Model Additional Ensemble Approaches

Here I used ensembling algorithms to try to improve the model(Due to computational reasons, I did not tune these ensemble models to their max. This explains why their accuracy may be lower than that of the tuned SVM model)

Experimented with a hard voting classifier of three estimators (KNN, SVM, RF) (81.4%)
Experimented with a soft voting classifier of three estimators (KNN, SVM, RF) (81.7%) (best performance in competition leaderboard)
Experimented with soft voting on all estimators performing better than 80% except xgb (KNN, RF, LR, SVC) (82.6%)
Experimented with soft voting on all estimators including XGB (KNN, SVM, RF, LR, XGB) (82.8%) (Best Performance)

Try the app

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
Submissions		Submissions
static		static
templates		templates
titanic		titanic
Method1-UsingEnsembleLearning.ipynb		Method1-UsingEnsembleLearning.ipynb
Method2-UsingLearningCurves.ipynb		Method2-UsingLearningCurves.ipynb
Procfile		Procfile
README.md		README.md
app.py		app.py
custom_estimators.py		custom_estimators.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
titanic_model.dill		titanic_model.dill
titanic_model.py		titanic_model.py

KamgangAnthony/Web-App-Titanic-Survival

Folders and files

Latest commit

History

Repository files navigation

Web-App-Titanic-Survival

Project Overview

Code and Resources Used

Data from the passengers:

EDA

Feature engineering and cleaning

Model Building

Model Additional Ensemble Approaches

About

Topics

Resources

Stars

Watchers

Forks

Languages