Recipe Ingredients Dataset Analysis and Cuisine Classification

Project Overview

This repository contains a detailed analysis of a dataset containing recipes with their respective ingredients, used to predict the cuisine based on these ingredients. The analysis includes various steps such as data exploration, visualizations, and classification modeling using multiple algorithms, including Multinomial Naive Bayes, XGBoost, CNN, and Random Forest.

The primary objective of this project is to explore the relationship between ingredients and cuisine types, and develop machine learning models to classify recipes into different cuisine categories.

Files in the Repository

Notebook (recipe_ingredients_classification.ipynb):
- This Jupyter notebook contains the entire analysis, including data preprocessing, exploratory data analysis (EDA), feature extraction, and machine learning model training and evaluation.
- It covers:
  1. Data Import and Extraction: Loading and extracting the dataset.
  2. Data Preprocessing: Cleaning and organizing data.
  3. Data Exploration: Visualizing ingredient distributions and cuisine counts.
  4. Modeling: Using multiple classifiers to predict the cuisine of recipes based on ingredients.
Data:
- The dataset is downloaded from Kaggle using the kaggle API and consists of recipes categorized by their cuisine and a list of ingredients.
Output:
- The notebook generates visualizations, including count plots, word clouds, and confusion matrices, to assess the performance of different models.

Installation and Setup

Prerequisites

You need Python 3.x with the following libraries installed:
- pandas
- matplotlib
- seaborn
- scikit-learn
- xgboost
- keras
- tensorflow
- wordcloud
- nltk
- numpy

To install the required libraries, run the following command:

pip install pandas matplotlib seaborn scikit-learn xgboost keras tensorflow wordcloud nltk numpy

Kaggle API Setup

To download the dataset from Kaggle, you must set up the Kaggle API on your environment.

Create a Kaggle account if you don't have one.
Go to Kaggle API and create a new API key (a kaggle.json file).
Upload the kaggle.json file to your environment and set the Kaggle credentials path:
```
!mkdir -p ~/.kaggle
!cp /content/kaggle.json ~/.kaggle/
```
Install the Kaggle package:
```
pip install kaggle
```

Download the dataset:

!kaggle datasets download -d kaggle/recipe-ingredients-dataset
!unzip -q recipe-ingredients-dataset.zip

Dataset Information

train.json: The training dataset containing recipes with their ingredients and corresponding cuisine labels.
test.json: The test dataset containing recipes with their ingredients (without cuisine labels for predictions).

Each entry in the dataset consists of:

ingredients: A list of ingredients used in the recipe.
cuisine: The cuisine type for the recipe (only available in the training set).

Key Sections in the Notebook

Data Exploration:
- Loading the dataset and performing basic checks.
- Visualizations:
  - Count plot of cuisines.
  - Distribution of the number of ingredients.
  - Boxplot of ingredients by cuisine.
  - Wordcloud representation of most frequent ingredients per cuisine.
Feature Engineering:
- Creating new features like the number of ingredients per recipe.
- Vectorizing the ingredients list into a bag-of-words representation using CountVectorizer.
Machine Learning Models:
- Multinomial Naive Bayes: A basic model used for classification based on the ingredients.
- XGBoost: A powerful gradient boosting model for classification.
- CNN (Convolutional Neural Network): A deep learning approach using Keras to predict cuisines based on ingredients.
- Random Forest: An ensemble model for predicting cuisines.
For each model, the notebook includes:
- Model training
- Predictions
- Evaluation metrics (accuracy, confusion matrix, classification report)
Model Evaluation:
- Confusion matrix visualization for each model.
- Accuracy comparison across different models.

How to Run the Notebook

Clone the repository:

git clone https://github.com/yourusername/recipe-ingredients-cuisine-classification.git
cd recipe-ingredients-cuisine-classification

Install the required libraries:
```
pip install -r requirements.txt
```

Run the Jupyter notebook:

jupyter notebook recipe_ingredients_classification.ipynb

Follow the instructions in the notebook to explore the dataset and run the models.

Results and Visualizations

Distribution of Cuisines: Visualizing how the recipes are distributed across different cuisine categories.
Number of Ingredients: Analyzing how the number of ingredients varies across different cuisines.
Top 20 Ingredients: A bar plot showing the most common ingredients in the dataset.
Wordclouds: Word clouds for each cuisine showing the most frequent ingredients.
Confusion Matrices: For each model, a confusion matrix to evaluate the classification performance.

Future Improvements

Hyperparameter Tuning: Use grid search or random search to tune the hyperparameters for each model and improve accuracy.
Cross-validation: Implement cross-validation to ensure the robustness of the models.
Deep Learning: Experiment with more complex deep learning architectures like LSTM or Transformers for ingredient-based classification.
Data Augmentation: Apply data augmentation techniques to expand the dataset and improve model performance.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Dataset sourced from Kaggle: Recipe Ingredients Dataset
Libraries used: pandas, matplotlib, seaborn, scikit-learn, xgboost, keras, tensorflow, wordcloud, nltk, and numpy.

Contact

If you have any questions or suggestions, feel free to open an issue or contact the repository owner at [[email protected]].

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Cuisine_Prediction_Semantic_Solution.ipynb		Cuisine_Prediction_Semantic_Solution.ipynb
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recipe Ingredients Dataset Analysis and Cuisine Classification

Project Overview

Files in the Repository

Installation and Setup

Prerequisites

Kaggle API Setup

Dataset Information

Key Sections in the Notebook

How to Run the Notebook

Results and Visualizations

Future Improvements

License

Acknowledgments

Contact

About

Releases

Packages

Languages

License

suraj5424/Cuisine-Prediction

Folders and files

Latest commit

History

Repository files navigation

Recipe Ingredients Dataset Analysis and Cuisine Classification

Project Overview

Files in the Repository

Installation and Setup

Prerequisites

Kaggle API Setup

Dataset Information

Key Sections in the Notebook

How to Run the Notebook

Results and Visualizations

Future Improvements

License

Acknowledgments

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages