Deploying production ready environment for Spark cluster
-
Updated
Oct 30, 2022 - HCL
Deploying production ready environment for Spark cluster
Projeto do Curso "Criando um Ecossistema Hadoop Totalmente Gerenciado com Google Cloud Dataproc" do Bootcamp Data Engineer da Digital Innovation One
Creating gcloud dataproc cluster with this github action
Determination of which words occur in a dataset of textbooks along with each word's occurrence count identification with the help of Google Cloud Platform based Dataproc cluster formation.
Kaggle - Outbrain Click Prediction (Oct-2016 - Jan-2017)
Training a classification model as a Dataproc Job and using Kafka/PubSub connector for real-time prediction using pre-trained models
Content about how to create big data ecosystems on the Cloud
PySpark Job that runs in Dataproc cluster, loads data from Cloud Storage to BigQuery table.
Criando um ecossitema Hadoop totalmente gerenciado com Google Cloud Platform: O desafio consiste em efetuar um processamento de dados utilizando o produto Dataproc do GCP. Esse processamento irá efetuar a contagem das palavras de um livro e informar quantas vezes cada palavra aparece no mesmo.
Data Science Project: Predicting voter turnout in swing states in the United States based on 2020 General Election data through big data analytics
A Scala Spark based project to experiment with map-reduce algorithms on big data graph shaped
Yelp ETL Pipeline in Apache Spark on Google Cloud Dataproc
Collection of personal resources on Google Cloud
Data Workflows with GCP Dataproc, Apache Airflow and Apache Spark
gke with terraform, dataproc with terraform
Dataproc Customisable HA cluster debian-9 with zookeeper,kafka ,BigQuery and other tools/jobs with Terraform
An educational project to build an end-to-end pipline for near real-time and batch processing of data further used for visualisation and a machine learning model.
GCP_Data_Enginner
Add a description, image, and links to the dataproc-cluster topic page so that developers can more easily learn about it.
To associate your repository with the dataproc-cluster topic, visit your repo's landing page and select "manage topics."