Tableau Dashboard: "CloudScore Analytics: GCP-Driven Soccer Data Engineering Suite" is a sophisticated data engineering project that leverages Google Cloud Platform's robust capabilities for end-to-end management of soccer statistics. Utilizing Python for data extraction from API Football, the suite introduces a meticulous data cleaning phase, ensuring the highest data quality and integrity. The project involves automated workflows for data processing and storage in GCP Cloud Storage. Employing Google Dataflow, the suite ensures streamlined data transfer to BigQuery tables. Key features include Cloud Function-triggered data transformation into BigQuery tables and insightful visualization using Tableau. Orchestrated daily by Cloud Composer, this project exemplifies state-of-the-art data pipeline automation and analytics in the cloud.
- Python
- Google Cloud Platform (GCP) components:
- Cloud Storage
- Cloud Functions
- Dataflow
- BigQuery
- Cloud Composer
- Tableau for Visualization
- API Football for Data Source
- Automated Data Extraction: Seamlessly pull data from API Football.
- GCP-Powered Transformation and Storage: Leverage GCP services for efficient data handling.
- Daily Orchestrated Data Pipeline Runs: Ensure consistent data flow with Cloud Composer.
- Interactive Tableau Visualizations: Create dynamic visual representations of data.
To get this project up and running, follow these steps:
- Clone the repository:
git clone [https://github.com/zacharyvunguyen/CloudScore-Analytics--GCP-Driven-Soccer-Data-Engineering-Suite.git] cd [CloudScore-Analytics--GCP-Driven-Soccer-Data-Engineering-Suite]
- [Further steps regarding setting up GCP services, configuring API keys, etc.]
This suite comprises several steps to automate soccer data management using GCP. Follow these steps to set up the entire workflow:
-
Initialize Cloud Storage Bucket: Execute the script to create a new bucket in Cloud Storage.
python Create_Cloud_Storage_Bucket.py
-
Data Extraction and Storage: Run the script to fetch soccer data from API Football, perform data cleaning, and store the processed data in Cloud Storage.
python Fetching_API_Data_to_CloudStorage.py
-
BigQuery Dataset and Table Setup: Create a BigQuery dataset and table using the following script.
python Create_BigQuery_Dataset_and_Table_No_Schema.py
-
Prepare Dataflow Metadata: Set up the necessary parameters for Dataflow by running:
python Prepare_Dataflow_Parameters.py
-
Dataflow Job Execution:
-
Automate Dataflow with Cloud Function: Deploy a Cloud Function to trigger Dataflow jobs automatically when new files are uploaded to Cloud Storage.
-
Orchestrate Data Pipeline with Cloud Composer: