Skip to content

The Exploratory Data Analysis and Machine Learning Model Training for the Student Performance Data

Notifications You must be signed in to change notification settings

AdritPal08/EDA-and-ML-Model-Training-of-Student-Performance-Data

Repository files navigation

App Screenshot

EDA and ML Model Training of Student Performance Data

Life Cycle of this Project:

  1. Understanding the Problem Statement: The problem statement is to determine how variables such as gender, race/ethnicity, parental level of education, lunch, and test preparation course affect student performance (test scores).
  2. Data Collection: Relevant data was gathered from Kaggle.
  3. Data Checks: A series of data checks were performed to ensure that the data was clean, complete, and in the correct format. This included checking for missing values, duplicate values, and outliers, as well as data types and the number of unique values in each column.
  4. Exploratory Data Analysis (EDA): The data was analyzed to understand its structure, patterns, and relationships. This involved computing summary statistics, exploring correlations between variables, identifying potential outliers or missing values, and finding numerical and categorical columns along with the number of unique values in each categorical column.
  5. Data Visualization: Visualizations were created to identify trends and patterns that may be difficult to see in tabular format, helping to gain insights quickly and communicate results effectively to others.
  6. Data Pre-Processing: The data was transformed to make it suitable for use with machine learning models. This involved techniques such as scaling, normalization, feature selection, or feature engineering.
  7. Model Training: Machine learning models were built using the pre-processed data. The data was split into training and test sets, and the training set was used to train the models.
  8. Model Evaluation: The performance of the models was evaluated using various metrics such as Root Mean Squared Error, Mean Absolute Error, R2 Score, and accuracy. This helped to determine which models were performing best.
  9. Choosing the Best Model: Based on the evaluation results, the best-performing model was chosen for predicting student performance.

Follow Me on:

linkedin

About

The Exploratory Data Analysis and Machine Learning Model Training for the Student Performance Data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published