GitHub - rnair7163/Restaurant-Recommendation-System-using-Yelp-Dataset: Building a Recommendation System for customer using Yelp dataset of restaurants.

Restaurant-Recommendation-System-using-Yelp-Dataset

This is our academic project for CSP-571 "Data Preparation And Analysis". In this project we built a personalized recommender web app using Yelp dataset of restaurants. We tested various models like Pure Collaborative, Approximate Nearest Neighbour, K-NN, Naive Bayes and Hybrid Maxtrix Factorization on various hyperparameters which were tuned using the library "scikit optimizer" which uses "Bayesian Optimization" technique. We tested the models using AUC which is a decision-support metric that checks whether customers like the item or not. In our case, figuring out customer preference in general is more important and practical. And for deployment, we used Angular8 and Flask frameworks.

Datasets:-

Primary Dataset

The primary dataset for our model was Yelp dataset. From that, we used 3 datasets namely business.json, reviews.json and users.json.

Secondary Dataset

The Secondary dataset for our model was median income for each postal code which was then mapped to businesses.

Data Cleaning and Data Preparation

Business

Following are the ways in which we cleaned and prepared our Business dataset:

It was found that there are no duplicate entries, as there are no recurring business ids 2.There were few cities who were not counted as one though they were same. For example, St, Joseph was also recorded as Saint Joseph which was creating two different records though they were same.
Standardize the date format of hours variable
Businesses with open = 1 tag were only considered
Considered only Illinois-based restaurants in our dataset since we focussed on Illinois for building recommender system.
Out of 436 tags we decided to keep top 60 tags with highest popularities
We have incorporated the median income for each zip code from the secondary dataset

Users

Computed tenure for each user using start year of the user.
Computed TF-IDF for each tag which will be used as weights during model fitting

Review

We cross checked the ‘review_count’ variable in the user dataset by aggregating the number of reviews given by each user from the review dataset
We found that there were cases where a unique user would rate one restaurant several times throughout his/her history. For such case, we only kept the most recent review since it reflected the user’s latest preference.
Removed user biases from the ratings provided by the user, we have standardized the users’ rating by subtracting their mean rating, and converting it to (-1,1). The reason why we did that transformation is that we wanted to focus more on ranking the user liked restaurants and disliked restaurants in the correct order, rather than predicting user ratings on each restaurant, which would result in high variance over time.

Exploratory Data Analysis

Business

Business dataset contains geographical information about 192,609 businesses, categories and attributes, such as average star rating, hours, whether they offer parking etc.
It majorly consists of Business from North America.
The yelp business dataset consists of variety of different businesses indicated by the column ‘attributes’. Number of entries for the category ‘Restaurants’ was the highest.

Users

Dataset user includes information like how long ago the user has joined Yelp, the number of reviews he/she has written, the number of specific compliments received, and his/her friend mapping on Yelp about 1,637,138 users.
There are outliers observed in the value of review_count in the user's dataset.

Review

We have performed sentiment analysis on the review dataset text. People seem to be more likely to write a review for a positive experience than a negative one
Most of the users have given more positive ratings to the restaurants.

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
API		API
EDA		EDA
PPT		PPT
Secondary_dataset		Secondary_dataset
Web-App		Web-App
data cleaning		data cleaning
plots		plots
recommender_system		recommender_system
wordcloud		wordcloud
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Restaurant-Recommendation-System-using-Yelp-Dataset

Datasets:-

Primary Dataset

Secondary Dataset

Data Cleaning and Data Preparation

Business

Users

Review

Exploratory Data Analysis

Business

Users

Review

Contributors:-

About

Releases

Packages

Contributors 4

Languages

rnair7163/Restaurant-Recommendation-System-using-Yelp-Dataset

Folders and files

Latest commit

History

Repository files navigation

Restaurant-Recommendation-System-using-Yelp-Dataset

Datasets:-

Primary Dataset

Secondary Dataset

Data Cleaning and Data Preparation

Business

Users

Review

Exploratory Data Analysis

Business

Users

Review

Contributors:-

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages