#

dataproc-cluster

Here are 22 public repositories matching this topic...

Wittline / pyDag

Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

bigquery cloud big-data workflow-engine google-cloud data-engineering task-scheduler google-cloud-platform dataproc-cluster dag parallel-processing data-pipeline dataengineering dataproc directed-acyclic-graph task-scheduling

Updated Sep 19, 2022
Python

naranjja / gcp-jupyter-sql

Run Jupyter Notebooks (and store data) on Google Cloud Platform.

jupyter-notebook dataproc-cluster cloud-sql compute-engine

Updated Oct 6, 2017
Python

anjijava16 / GCP_Data_Enginner_Utils

GCP_Data_Enginner

python bigquery scala notebook gcp pubsub pyspark dataflow shell-script dataproc-cluster dataproc gcp-storage big-data-processing

Updated Sep 4, 2021
Shell

spotify / limbo

scala spark google-cloud google-cloud-dataflow dataproc-cluster

Updated Jan 2, 2017
Scala

MarieeCzy / METAR-Data-Engineering-and-Machine-Learning-Project

An educational project to build an end-to-end pipline for near real-time and batch processing of data further used for visualisation and a machine learning model.

python docker bigquery machine-learning looker big-data spark terraform pyspark dataproc-cluster googlecloudplatform dataproc prefect streamlit

Updated May 19, 2023
Python

dwaiba / dataproc-terraform

Dataproc Customisable HA cluster debian-9 with zookeeper,kafka ,BigQuery and other tools/jobs with Terraform

google-cloud-platform dataproc-cluster dataproc dataproc-clusters terraform-gcp gcp-dataproc terraform-dataproc dataproc-ha-terraform

Updated Feb 29, 2020
HCL

cloudgear-io / gke-terraform

gke with terraform, dataproc with terraform

Updated Mar 11, 2020
HCL

jaiswalanshul / gcp_dataproc_spark_airflow

Data Workflows with GCP Dataproc, Apache Airflow and Apache Spark

airflow spark gcp dataproc-cluster dataproc airflow-operators

Updated Mar 4, 2020
Python

bilalsp / yelp_etl

Yelp ETL Pipeline in Apache Spark on Google Cloud Dataproc

bigquery circleci spark apache-spark gcs dataproc-cluster etl-pipeline

Updated Jul 10, 2021
Jupyter Notebook

pietrocarbo / scala-ble

A Scala Spark based project to experiment with map-reduce algorithms on big data graph shaped

scala big-data spark apache-spark yarn hadoop cluster hdfs mapreduce google-cloud-platform dataproc-cluster triangle-counting friend-recommendation

Updated Jul 13, 2018
Scala

mr-ubik / google-nembo

Collection of personal resources on Google Cloud

docker google cloud tensorflow keras google-cloud datascience google-cloud-platform nvidia-docker dataproc-cluster google-compute-engine dataproc

Updated Dec 1, 2017

akaliutau / gcp-prod-spark-cluster

Deploying production ready environment for Spark cluster

devops terraform gcp dataproc-cluster custom-image

Updated Oct 30, 2022
HCL

vishnudxb / gcloud-dataproc-creation

Creating gcloud dataproc cluster with this github action

testing big-data google-cloud pyspark spark-streaming dataproc-cluster

Updated Oct 18, 2020
Shell

tirthmehta / Google-Cloud-Platform-based-Hadoop-Map-Reduce

Determination of which words occur in a dataset of textbooks along with each word's occurrence count identification with the help of Google Cloud Platform based Dataproc cluster formation.

java dataproc-cluster crawler4j googlecloud dataprocessing googlecloudplatform dataproc

Updated Jul 28, 2017
Java

jonathanAmancioSales / Hadoop_Dataproc_Google_Cloud_Platform_DIO

Projeto do Curso "Criando um Ecossistema Hadoop Totalmente Gerenciado com Google Cloud Dataproc" do Bootcamp Data Engineer da Digital Innovation One

hadoop google-cloud pyspark dataproc-cluster google-cloud-dataproc

Updated Aug 21, 2021
Shell

natmurad / cloudbigdata

Content about how to create big data ecosystems on the Cloud

aws aws-s3 google-cloud data-engineering aws-ec2 dataproc-cluster aws-firehose

Updated Aug 28, 2021
HTML

jjtoharia / Kaggle_Outbrain

Kaggle - Outbrain Click Prediction (Oct-2016 - Jan-2017)

python r spark python3 kaggle xgboost dataproc-cluster lstm-neural-networks

Updated Apr 21, 2017
R

vasisthasinghal / Yelp-Review-Classification

Training a classification model as a Dataproc Job and using Kafka/PubSub connector for real-time prediction using pre-trained models

gcp pubsub pyspark apache-kafka google-cloud-platform dataproc-cluster big-data-analytics pyspark-mllib

Updated Oct 11, 2020
Jupyter Notebook

mihir-robotics / pyspark-gcp-project

PySpark Job that runs in Dataproc cluster, loads data from Cloud Storage to BigQuery table.

bigquery google-cloud dataproc-cluster pyspark-python

Updated Feb 15, 2024
Python

qiguan1 / Advanced-Churn-Prediction-GCP-Apache-Spark

machine-learning apache-spark random-forest dataproc-cluster churn-prediction gcp-cloud-functions

Updated Jun 26, 2021
Jupyter Notebook

Improve this page

Add a description, image, and links to the dataproc-cluster topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dataproc-cluster topic, visit your repo's landing page and select "manage topics."