corporate_credit_risk

Group Details:

Group 3 Members: Mitchell Lor, Frewoini Mebrahtu, Eric Johnson, Lucinda Hodgson

Project Title: Predicting Corporate Credit Ratings

Data Source: Kaggle Dataset Link

Overview:

Introduction: The project aims to leverage data analysis techniques to extract meaningful insights and predict credit ratings for corporations to assist in investment decisions. By utilizing a dataset sourced from Kaggle, the group intends to preprocess the data meticulously before applying various machine learning models for predictive analysis. The models will undergo optimization and evaluation to ensure accuracy and reliability in predicting credit ratings.

Project Details:

Data Acquisition and Preprocessing:
- A robust dataset comprising over 7000 records was sourced from Kaggle.
- The data was loaded into a database using Python Pandas, followed by SQL queries for data retrieval.
- Cleaning and preprocessing involved dropping unnecessary columns and identifying significant metrics for analysis.
Initial Attempts:
- Initially, deep learning techniques were explored; however, encountered roadblocks due to overfitting and imbalanced data.
- Overfitting was observed due to the simplicity of the data, leading to poor generalization.
- Imbalanced data, where investment grade loans dominated, posed challenges for deep learning.
Model Evaluation:
- Three models were developed and evaluated:
  - Model 1: Loss - 0.636, Accuracy - 0.667
  - Model 2: Loss - 0.490, Accuracy - 0.791
  - Model 3: Loss - 0.439, Accuracy - 0.797
Random Forest Model for Credit Rating Forecasting:
- A Random Forest Classifier model was employed to forecast credit ratings based on a curated dataset.
- Data preprocessing involved loading and cleaning data, extracting essential features, and incorporating dummy variables for categorical data representation.
- The dataset was split into training and testing sets, and standard scaling was applied for consistent feature scaling.
- A Random Forest Classifier with 500 decision trees was trained on the scaled data to capture complex relationships.
  
  PDF Example of a Random Tree in Our Model
Model Evaluation and Feature Importance Analysis:
- The model's performance was evaluated using standard metrics such as confusion matrix, accuracy score, and classification report.
- Additionally, a feature importance analysis was conducted to identify the significant contributors to credit rating prediction.
Search API to Test Model:
- Using Alpha Vantage Data
- Pulling API based on Ticker values and feeding their recent financial performance into the models.

Conclusion:

The application of machine learning has yielded encouraging outcomes. Through experimentation with various models, some patterns have emerged: certain models excel in predicting positive outcomes, while others are proficient in identifying negative outcomes. The random forest models are the top performers with 95% accuracy rate on this test dataset.

However, during deployment in real-world scenarios, particularly in predicting junk credit status (S&P BB+ or lower), challenges arose. Despite techniques like oversampling and undersampling to address class imbalances, the models struggled to accurately identify instances of junk credit. They did however exhibit consistent success in predicting good credit status.

To enhance model performance, alternative methods were explored such as k-folding and feature engineering. One notable limitation was the absence of industry sector information in our API. This was available in training and testing datasets, and when utilized the model performance improved. But these features were dropped due to constraints in the API's data retrieval capabilities. It is evident that incorporating industry sector data could significantly enhance prediction accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
data		data
img		img
keys		keys
models		models
notebooks		notebooks
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
Presentation.pptx		Presentation.pptx
README.md		README.md
References.md		References.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

keys

keys

models

models

notebooks

notebooks

scripts

scripts

.DS_Store

.DS_Store

.gitignore

.gitignore

Presentation.pptx

Presentation.pptx

README.md

README.md

References.md

References.md

Repository files navigation

corporate_credit_risk

Conclusion:

About

Releases

Packages

Contributors 3

Languages

ericjjohnson2/corporate_credit_rating

Folders and files

Latest commit

History

Repository files navigation

corporate_credit_risk

Conclusion:

About

Topics

Resources

Stars

Watchers

Forks

Languages