Skip to content

This project aims to use machine learning models on Kaggle data to predict corporate credit ratings to aid investment decisions.

Notifications You must be signed in to change notification settings

ericjjohnson2/corporate_credit_rating

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

corporate_credit_risk

Group Details:

  • Group 3 Members: Mitchell Lor, Frewoini Mebrahtu, Eric Johnson, Lucinda Hodgson

Project Title: Predicting Corporate Credit Ratings

Data Source: Kaggle Dataset Link

Overview:

Introduction: The project aims to leverage data analysis techniques to extract meaningful insights and predict credit ratings for corporations to assist in investment decisions. By utilizing a dataset sourced from Kaggle, the group intends to preprocess the data meticulously before applying various machine learning models for predictive analysis. The models will undergo optimization and evaluation to ensure accuracy and reliability in predicting credit ratings.

Project Details:

  1. Data Acquisition and Preprocessing:

    • A robust dataset comprising over 7000 records was sourced from Kaggle.

    • The data was loaded into a database using Python Pandas, followed by SQL queries for data retrieval.

    • Cleaning and preprocessing involved dropping unnecessary columns and identifying significant metrics for analysis.

      binaryimage

  2. Initial Attempts:

    • Initially, deep learning techniques were explored; however, encountered roadblocks due to overfitting and imbalanced data.
    • Overfitting was observed due to the simplicity of the data, leading to poor generalization.
    • Imbalanced data, where investment grade loans dominated, posed challenges for deep learning.
  3. Model Evaluation:

    • Three models were developed and evaluated:
      • Model 1: Loss - 0.636, Accuracy - 0.667
      • Model 2: Loss - 0.490, Accuracy - 0.791
      • Model 3: Loss - 0.439, Accuracy - 0.797

    deeplearning

  4. Random Forest Model for Credit Rating Forecasting:

    • A Random Forest Classifier model was employed to forecast credit ratings based on a curated dataset.

    • Data preprocessing involved loading and cleaning data, extracting essential features, and incorporating dummy variables for categorical data representation.

    • The dataset was split into training and testing sets, and standard scaling was applied for consistent feature scaling.

    • A Random Forest Classifier with 500 decision trees was trained on the scaled data to capture complex relationships.

      Random Forest Confusion Matrix

      Random Forest Importances Plot

      Random Forest ROC Curve

      PDF Example of a Random Tree in Our Model

  5. Model Evaluation and Feature Importance Analysis:

    • The model's performance was evaluated using standard metrics such as confusion matrix, accuracy score, and classification report.
    • Additionally, a feature importance analysis was conducted to identify the significant contributors to credit rating prediction.
  6. Search API to Test Model:

    • Using Alpha Vantage Data

    • Pulling API based on Ticker values and feeding their recent financial performance into the models.

      preview

Conclusion:

The application of machine learning has yielded encouraging outcomes. Through experimentation with various models, some patterns have emerged: certain models excel in predicting positive outcomes, while others are proficient in identifying negative outcomes. The random forest models are the top performers with 95% accuracy rate on this test dataset.

However, during deployment in real-world scenarios, particularly in predicting junk credit status (S&P BB+ or lower), challenges arose. Despite techniques like oversampling and undersampling to address class imbalances, the models struggled to accurately identify instances of junk credit. They did however exhibit consistent success in predicting good credit status.

To enhance model performance, alternative methods were explored such as k-folding and feature engineering. One notable limitation was the absence of industry sector information in our API. This was available in training and testing datasets, and when utilized the model performance improved. But these features were dropped due to constraints in the API's data retrieval capabilities. It is evident that incorporating industry sector data could significantly enhance prediction accuracy.

About

This project aims to use machine learning models on Kaggle data to predict corporate credit ratings to aid investment decisions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages