Skip to content

UtshoData/BestMovieDetailsAnalyticsUsingTableau-main-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BestMovieDetailsAnalyticsUsingTableau-main-

Introduction

IMDb stands for the Internet Movie Database. It is an online database of information related to films, television programs, home videos, video games, and streaming content. I have scraped more than 700 data from 7 pages of this Website to find some interesting findings.

Dataset:

The dataset used for this analysis is the IMDB Dataset of top 700 movies .It contains information regarding top 700 movies . Contents of the dataset are:
Series_Title = Name of the movie
Released_Year — Year at which that movie released
Certificate — Certificate earned by that movie
Runtime — Total runtime of the movie
Genre — Genre of the movie
IMDb_Rating — Rating of the movie at IMDB site
Meta_score — Score earned by the movie
Director — Name of the Director
Star1,Star2,Star3,Star4 — Names of the cast
Gross — Money earned by that movie

Problem Statement:

1.What are the top 10 movies based on the number of votes on IMDb?
2.How many movies were released in each year according to IMDb’s database
3.Top 10 Genre count 4.Circle graph of Genre vs Rate (size of circle depends on the count of released movie per year)
5.Gross and Rate according to Name and Genre
6.How many movies have each certificate (e.g., G, PG, PG-13, R, etc.) on IMDb
7.Which are the top 10 genres based on Metascore, and what are their average Metascore ratings

Tableau Dashboard view ---Here

image

Important Findings

1.Analyzing movies with the highest number of votes can provide an understanding of which films have garnered the most attention and engagement from the IMDb user community.Shawshank Redemption is the most top rated (9.3) compared to others.
2. Examining the distribution of movie certificates can offer insights into the audience demographics and preferences. For example, a higher number of movies with a " R " certificate may indicate a focus on aa dult audience.Most rated movies are mostly under R certification and then PG-13 certificate.
3.Observing the distribution of movies over the years can reveal trends in the film industry. A spike in a particular year might be associated with a significant cultural or technological shift after 1998.
4.Analyzing the total gross for movies each year can highlight the financial success of the film industry over time. Avarter in the gross may coincide with major blockbuster releases or economic factors and the gross was $720M and in The Dark Night was in the second position.
5.Audiances love the Action, Adventure and Science Fiction genre of movies. Cz the maximum rate was 8.80 and also more 30 movies is being produced in this sector.
6.Comparing Metascore ratings with for top 10 genres can offer a perspective on how critical reviews align with audience opinions. Consistency or divergence between the two metrics may indicate interesting dynamics in genre preferences.

Build From Sources and Run the Selenium Scraper

  1. Clone the repo
git clone [https://github.com/UtshoData/BestMovieDetailsAnalyticsUsingTableau-main-.git]
  1. Intialize and activate virtual environment
virtualenv --no-site-packages  venv
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Download Chrome WebDrive from https://chromedriver.chromium.org/downloads
  2. Run the scraper
python selenium_scraper/scraper.py --chromedriver_path <path_to_chromedriver>