rdd
Here are 189 public repositories matching this topic...
Streaming data in Spark and doing data analytics
-
Updated
Sep 19, 2019 - Python
CS651 Final Project
-
Updated
Jan 3, 2022 - Jupyter Notebook
PageRank - Pig vs PySpark comparison https://madoc.univ-nantes.fr/mod/assign/view.php?id=1511791
-
Updated
Oct 20, 2022 - Python
Analysis of a college student dataset using Spark RDD. Demo of various operations on RDD such as countByValue, groupBy, groupByKey, reduceByKey,etc. Demo of map, flatmap, split, explicit, filter, type conversion, finding sum, count, distinct, aggregate, length of RDD. Demo of String, int comparisons, UNION & intersetion on RDD's.
-
Updated
Sep 5, 2018
This project covers a range of fundamental operations on Resilient Distributed Datasets (RDDs) and DataFrames, along with an exploration of a Big Recommender Dataset using Apache Spark's powerful tools.
-
Updated
Aug 22, 2023 - Jupyter Notebook
This Repository contains tutorials for Natural Language Processing, Machine Leaning, Ontology Creation, Querying Ontology using DL-Query, Implementing Question and Answering System
-
Updated
Jul 29, 2017 - JavaScript
python rdd notebook in apache spark
-
Updated
Dec 8, 2018 - Jupyter Notebook
Solved assignments of coursera's Fundamentals of scalable data science course
-
Updated
Apr 22, 2020 - Jupyter Notebook
Improve this page
Add a description, image, and links to the rdd topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the rdd topic, visit your repo's landing page and select "manage topics."