This repository contains Airflow Directed Acyclic Graphs (DAGs) and associated scripts for orchestrating an Extract, Transform, Load (ETL) workflow. The workflow is designed to extract data from a source, perform transformations, and load it into a data warehouse.
The ETL workflow consists of the following components:
- DAGs: Airflow DAGs define the workflow's structure and task dependencies.
- Scripts: Python scripts used by Airflow tasks for data extraction, transformation, and loading.
- SQL Scripts: SQL scripts for database operations, such as creating tables or performing Slowly Changing Dimension (SCD) updates.
This DAG orchestrates the full ETL workflow, including building dimension tables and loading the data warehouse.
Tasks:
Build_Dimantions
: Builds dimension tables using SQL scripts.Extract_v1
: Extracts data from the source system.Transform_v1
: Transforms extracted data using Python scripts.Load_v1
: Loads transformed data into the data warehouse.
This DAG focuses on the ETL process, excluding dimension table builds.
Tasks:
Extract_v1
: Extracts data from the source system.Transform_v1
: Transforms extracted data using Python scripts.Load_v1
: Loads transformed data into the data warehouse.
Python script for building dimension tables in the data warehouse.
Python script for extracting data from the source system.
Python script for transforming extracted data.
Python script for loading transformed data into the data warehouse.
dimproduct.sql
: SQL script for Slowly Changing Dimension (SCD) operations on the product dimension.dimcustomer.sql
: SQL script for SCD operations on the customer dimension.fact_sales.sql
: SQL script for loading data into the fact table.
- Install Apache Airflow and configure the Airflow environment.
- Clone this repository.
- Place the DAG files in the Airflow DAGs directory (
$AIRFLOW_HOME/dags
). - Execute the DAGs using the Airflow UI or CLI.
- Monitor the DAG runs and task executions in the Airflow UI.
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License.