Skip to content

Latest commit

 

History

History
28 lines (15 loc) · 2.86 KB

README.md

File metadata and controls

28 lines (15 loc) · 2.86 KB

Description of Data Management and Model Evaluation Program

The provided code imports and processes a dataset containing information about recipes and their corresponding cuisines. It then proceeds to train and evaluate several machine learning models to predict the cuisine of a recipe based on its ingredients.

Program Workflow

  1. Data Import and Preprocessing: The program starts by importing the necessary libraries and datasets from Kaggle. It then reads the training and test data into Pandas DataFrames for further processing.

  2. Data Exploration and Visualization: The program performs exploratory data analysis (EDA) on the training data, visualizing the distribution of cuisines, the number of ingredients per recipe, and the relationship between the number of ingredients and cuisines.

  3. Model Training and Evaluation:

    • Multinomial Naive Bayes (MNB): The program uses a Multinomial Naive Bayes classifier to predict the cuisine based on the ingredients. It vectorizes the ingredients using CountVectorizer, trains the model, and evaluates its performance using accuracy score, confusion matrix, and classification report.

    • XGBoost Model: Next, the program employs an XGBoost classifier to predict the cuisine. It encodes the cuisine labels, vectorizes the ingredients using CountVectorizer, trains the XGBoost model, and evaluates its performance on the validation set using accuracy score, confusion matrix, and classification report.

    • Convolutional Neural Network (CNN): The program utilizes a CNN model to predict the cuisine based on the ingredients. It tokenizes the ingredients, pads sequences to a fixed length, builds and compiles the CNN model, and trains it on the training set with early stopping. It then evaluates the model's performance on the test set using accuracy score, confusion matrix, and classification report.

    • Random Forest Model: Finally, the program employs a Random Forest classifier to predict the cuisine. It vectorizes the ingredients using TfidfVectorizer, trains the Random Forest model, and evaluates its performance on the validation set using accuracy score, confusion matrix, and classification report.

Features and Outputs

  • Data Visualization: The program visualizes the distribution of cuisines, the number of ingredients per recipe, and the relationship between the number of ingredients and cuisines through various plots and charts.

  • Model Evaluation: For each model, the program calculates and displays evaluation metrics such as accuracy score, confusion matrix, and classification report on the validation or test set.

Conclusion

Overall, the program efficiently manages the data, explores its characteristics, trains multiple machine learning models, and evaluates their performance, providing valuable insights into predicting the cuisine of recipes based on their ingredients.