This repository contains a machine learning project aimed at predicting the survival status of patients diagnosed with liver cirrhosis. Using a combination of data preprocessing techniques and machine learning models, this project provides a framework for accurate survival prediction based on clinical and laboratory features.
Liver cirrhosis is a chronic condition that significantly impacts patient survival. This project leverages machine learning to classify survival status into three categories:
- C: Compensated Cirrhosis
- D: Decompensated Cirrhosis
- CL: Chronic Liver Disease
The goal is to provide an automated and efficient way to assist clinicians in decision-making and resource allocation.
The dataset includes clinical and laboratory data for patients diagnosed with liver cirrhosis.
- Training Data: 224 samples with 19 features (including 'Status').
- Test Data: 88 samples with 18 features (excluding 'Status').
- Imputation of missing values using median values.
- One-hot encoding for categorical features.
- Balancing the target labels using SMOTE (Synthetic Minority Oversampling Technique).
- Data Cleaning and Preprocessing: Handled missing values and categorical encoding.
- Class Balancing: Addressed label imbalance using SMOTE.
- Model Training and Evaluation:
- Logistic Regression
- Random Forest
- Support Vector Machine (SVM)
- Prediction: Used the best-performing model to generate test predictions.
- A simple baseline model offering interpretability.
- Robust against overfitting and effective for non-linear relationships.
- Versatile and effective in high-dimensional feature spaces.
The models were evaluated on a validation set:
- Random Forest achieved the best results with an accuracy of 82%, providing strong recall for the minority class ('CL').
Predictions for the test dataset were saved to a file: test_predictions.csv
.