Skip to content

I am asked to resample the credit card data since it is not balanced. First, I start to split the data and perform oversampling with RandomOverSampler and SMOTE method, and I undersample with ClusterCentroids algorithm. Then, I utilize the SMOTEENN method to oversample and undersample the data. Finally, I used ensemble models.

Notifications You must be signed in to change notification settings

SohrabRezaei/Credit-Risk-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Credit-Risk-Analysis

Overview of the analysis

I am asked to resample the credit card data since it is not balanced. First, I start to split the data and perform oversampling with RandomOverSampler and SMOTE method, and I undersample with ClusterCentroids algorithm. Then, I utilize the SMOTEENN method to oversample and undersample the data. Finally, I use ensemble models such as EasyEnsembleClassifier and BalancedRandomForestClassifier to predict the credit card fraud risks.

Results

  • The balanced accuracy score for RandomOverSampler oversampling method is 0.643. The precision and recall are 1 and 0.59, respectively, for non-fraudulent credit cards. image image

  • The balanced accuaracy score for SMOTE oversampling method is 0.662. The presicion and recall is 1 and 0.69 respectively for non-fraudaulent credit cards. image image

  • The balanced accuaracy score for ClusterCentroids undersampling method is 0.544. The presicion and recall is 1 and 0.4 respectively for non-fraudaulent credit cards. image image

  • The balanced accuracy score for SMOTEENN oversampling and undersampling method is 0.674. The precision and recall is 1 and 0.59 respectively for non-fraudulent credit cards.image image

  • The balanced accuaracy score for BalancedRandomForestClassifier ensemble method is 0.788. The presicion and recall is 1 and 0.87 respectively for non-fraudaulent credit cards. image image

  • The balanced accuaracy score for EasyEnsembleClassifier ensemble method is 0.915. The presicion and recall is 1 and 0.9 respectively for non-fraudaulent credit cards. image image

Summary

All the models had a precision of 1 for non-fraudulent cards, and all of them had 0.01 precision for fraudulent cards. Therefore, they are not suitable for predicting fraudulent credit cards. I recommend using The EasyEnsembleClassifier model, which has a balance accuracy score of 0.915 and a recall of 0.9 for non-fraudulent credit cards.

About

I am asked to resample the credit card data since it is not balanced. First, I start to split the data and perform oversampling with RandomOverSampler and SMOTE method, and I undersample with ClusterCentroids algorithm. Then, I utilize the SMOTEENN method to oversample and undersample the data. Finally, I used ensemble models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published