#

pyspark

Here are 3,394 public repositories matching this topic...

basel-ay / Hands-on-Apache-Spark

Writing dummy snippets of code to read, manipulate, and build a simple ML model with PySpark.

apache-spark linear-regression pyspark

Updated Jul 18, 2023
Jupyter Notebook

zuliani99 / All-Pairs-Docs-Similarity

Given a set of documents and the minimum required similarity threshold find the number of document pairs that exceed the threshold

sklearn pyspark tf-idf cosine-similarity document-similarity beir

Updated May 26, 2023
Jupyter Notebook

phricardorj / pyspark-study

🐍 | My PYSPARK studies. PySpark is an interface for Apache Spark in Python.

Updated Nov 11, 2022
Jupyter Notebook

JonathanPollyn / Spark

This notebook contains detailed code for spark and machine learning and databricks

python spark pyspark spark-sql pyspark-python

Updated Mar 15, 2023
Jupyter Notebook

data-miner00 / spark

A laboratory to carry out experiments with PySpark

python pyspark databricks

Updated Nov 5, 2023
Jupyter Notebook

furkancets / PrescreiberPipelineSpark

Trying best case apache spark working environment for robust data pipelines

spark apache-spark hadoop pyspark

Updated Apr 1, 2023
Python

simonediluna / Distributed-Data-Analysis-and-Mining

An academic project carried out for the Distributed Data Analysis and Mining course (a. y. 2022/2023)

distributed-systems data-science pyspark

Updated May 18, 2023
Jupyter Notebook

samanta-anupam / big_data_assignments

Assignments as given in the course of CSE545. All assignments are part of this course

lsh pyspark dimensionality-reduction svd word-count satellite-images blog-corpus

Updated Dec 3, 2017
Jupyter Notebook

vamshitalla / python

python spark datascience pyspark

Updated Aug 12, 2018
Python

SitiBanc / 1061-Data-Mining

1061Data Mining Research and Practice Homeworks

python data-mining scikit-learn python3 kaggle pyspark

Updated Oct 2, 2018
Jupyter Notebook

kovean1994 / live-comment-spark

video live comments LDA cluster on spark MLlib

python spark pyspark

Updated Jun 16, 2017

danielsqtang / Slowly-Changing-Dimension-Type-2

Code for creating a SCD Type 2 in pyspark2

hive pyspark scd

Updated Feb 26, 2018

Thelin90 / stockmarket-ml-research

stockmarket machine learning

ai pandas pyspark stockmarket iexfinance-api

Updated May 30, 2018
Python

abeasock / tutorials_open_source

Repo contains various tutorials I've created to help people learn Python and other tools

python machine-learning pyspark zeppelin

Updated Jun 27, 2019
Jupyter Notebook

JayLohokare / pySpark-30mins-training

A 30 mins pySpark training crash-course I created

tutorial pyspark

Updated Aug 14, 2018
Jupyter Notebook

sumanthvrao / IPL-Spark-Analysis

Predict outcomes of IPL Cricket Matches for the year 2018 using Spark MLLib framework.

spark pyspark decision-tree kmeans-clustering spark-mllib-library

Updated Nov 19, 2020
Jupyter Notebook

LeePleased / Pokemon-OneShotLearning

Leverage parallel python sprak computation based on intel deep learning architecture, bigdl to solve one shot learning on pokeman dataset by siamese network.

pokemon pyspark bigdl oneshotlearning

Updated Jan 1, 2019
Python

untaljohanperez / spark_basics

data-science spark pandas data-visualization pyspark data-analysis pyplot

Updated Mar 27, 2020
Jupyter Notebook

hcvazquez / ht-engineering

An engineering process for data science and big data processing

python data-science machine-learning airflow big-data apache-spark pyspark data-engineering jupiter-notebook

Updated Dec 8, 2022
Jupyter Notebook

hsm207 / grab-safety

My submission for Grab AI for S.E.A. challenge

pyspark telematics databricks spark-ml databricks-notebooks

Updated Jun 3, 2019
Jupyter Notebook

Improve this page

Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."