This project uses a large public dataset and Pandas to select a small subset of the data based on specific criteria.
Objective: For a new cooking website, 30 unique modern dessert recipes are needed. A selection is to be made from a public dataset from food.com.
image: screenshot from Kaggle.com - selected dataset
The relevant data available on kaggle.com contains two datasets:
- Recipe dataset (522,517 recipes from 312 different catagories)
- Reviews dataset (1,401,982 reviews from 271,907 users)
Both files in csv format are imported into Pandas.
In order to find the right selection from more than half a million options, the data manipulations of both sets include:
-value counts
-df.loc
-data type and field conversions
-left join
-saving to cvs and more
A final dataset containing 30 dessert recipes is reached and the file saved as a csv file for future use.