Skip to content

Create ETL pipeline to import raw data, transform them into DataFrames, and upload to database.

Notifications You must be signed in to change notification settings

spicyyramen/Movies-ETL

Repository files navigation

Movies ETL

Automated pipeline to intake Wikipedia data, Kaggle metadata, and MovieLens rating data and perform ETL by adding data to PostgreSQL database.

Write ETL Function to Read in Data Files

Created ETL function to import Wikipedia JSON file, Kaggle metadata file, and MovieLens ratings data file, then transformed them into Pandas DataFrames.

Example Code - ETL function
ETL Function Code

Wikipedia DataFrame
Wikipedia DataFrame

Kaggle DataFrame
Kaggle DataFrame

MovieLens Ratings DataFrame
MovieLens Dataframe

Extract and Transform Wikipedia Data

Using Python, Pandas, ETL, and code refactoring- extracted and transformed Wikipedia data to merge with Kaggle metadata.


Example Code - ETL Function

ETL Function


Example Code - Try-Except Statement

Try-Except Statement


Pandas DataFrame

Combined DataFrame

Extract and Transform Kaggle Data

  • Transformed Kaggle metadata and MovieLens ratings data into separate DataFrames
  • Merged Kaggle DataFrame with Wikipedia DataFrame to create movies_df
  • Merged MovieLens ratings DataFrame with movies_df DataFrame to create movies_with_ratings_df

Example Code

Example Code

Movies DataFrame
Movies DF

Movies with Ratings DataFrame
Movies with Ratings DF

Create Movie Database

Added the movies_df DataFrame and MovieLens CSV data into SQL database.

Example Code:

DataBase Creation Code

About

Create ETL pipeline to import raw data, transform them into DataFrames, and upload to database.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published