Skip to content
This repository has been archived by the owner on Sep 27, 2022. It is now read-only.

Latest commit

 

History

History
25 lines (18 loc) · 1.13 KB

README.md

File metadata and controls

25 lines (18 loc) · 1.13 KB

Music data analyzer scheduler

This project showcases how to design and schedule a series of jobs/steps using Apache Airflow with the following purposes

  • Backfill data
  • Build a dimensional data model using python
  • load data from AWS S3 bucket to AWS Redshift Datawarehouse
  • run quality checks on the data
  • Use or create custom operators and available hooks to create reusable code

Running the DAG

You can run the DAG on your own machine using docker-compose. To use docker-compose, you must first install Docker. Once Docker is installed:

  1. Open a terminal in the same directory as docker-compose.yml an
  2. Run docker-compose up
  3. Wait 30-60 seconds
  4. Open http://localhost:8080 in Google Chrome (Other browsers occasionally have issues rendering the Airflow UI)
  5. Make sure you have configured the aws_credentials and redshift connections in the Airflow UI

When you are ready to quit Airflow, hit ctrl+c in the terminal where docker-compose is running. Then, type docker-compose down

img