Skip to content

Algorithm for prediction which customer will repay credit loan with 0.79 ROC AUC.

Notifications You must be signed in to change notification settings

preguza/Credit_repay_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 

Repository files navigation

The task of this project was to identify which aplicants for loans will manage to repay it back. For this purpose was available complex data consisting of:

  • 7 .csv files relationed with 3 different primary keys;
  • 232 initial features;
  • 32,857,652 total number of records.

Performance metric I used was ROC AUC. It will help the business get a better understanding about how this algorithm can be used in production and the deployment options (canary deployment/partial automation/full automation etc).

The data was split in training and test sets. Test set was used only once to evaluate final performance of algorithm. After initial data preprocessing, I have trained various models and tested performance using crossvalidation. Most important results are as follows:

  • First iteration:
    • Logistic regression: AUC = 0.7459;
    • LGBM: AUC = 0.7550;
    • MLP: AUC = 0.7465;
    • Other models (KNN, Decision Tree, Random Forest, XGB) scored lower results.
  • Feature engineering with featuretools which resulted in 1759 features:
    • LGBM: AUC = 0.7739;
  • Dimensionality reduction with PCA: reduce number from 1759 to 336 while keeping 0.99 variation:
    • LGBM: AUC = 0.7499;
  • Feature selection by dropping least important features. Score on 145 features:
    • LGBM: AUC = 0.7748;
  • Hyperparameter tunning on 145 features using Optuna:
    • LGBM: AUC = 0.7809;
  • Performance on test data:
    • LGBM: AUC = 0.7906.

Oportunities for further improvement:

  • Try to finetune feature creation with featuretools;
  • Weight label classes;
  • Try to tune hyperparameters for other algorithms that performed well on initial data: MLP. Logistic regression. Then ensemble them.

About

Algorithm for prediction which customer will repay credit loan with 0.79 ROC AUC.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published