Skip to content

Ajayay/Data_cleaning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Data cleaning on Amazon Fine Food Reviews.

This dataset is taken from kaggle.com

Objective:

  • to perform pruning of the data, eleminating duplicacy, removing stopwords, performing stemming in order to obtain the positive and negative reviews which can be further used for applying models.

Context:

  1. This dataset consists of reviews of fine foods from amazon.

  2. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012.

  3. Reviews include product and user information, ratings, and a plain text review.

  4. It also includes reviews from all other Amazon categories.

  • Number of reviews: 568,454

  • Number of users: 256,059

  • Number of products: 74,258

  • Timespan: Oct 1999 - Oct 2012

  • Number of Attributes/Columns in data: 10

Data includes:

  • Reviews from Oct 1999 - Oct 2012

  • 568,454 reviews

  • 256,059 users

  • 74,258 products

  • 260 users with > 50 reviews

Feature information:

  • IdRow Id

  • ProductIdUnique identifier for the product

  • UserIdUnqiue identifier for the user

  • ProfileNameProfile name of the user

  • HelpfulnessNumeratorNumber of users who found the review helpful

  • HelpfulnessDenominatorNumber of users who indicated whether they found the review helpful

  • ScoreRating between 1 and 5

  • TimeTimestamp for the review

  • SummaryBrief summary of the review

  • TextText of the review

Releases

No releases published

Packages

No packages published