A curated list of awesome big data frameworks, ressources and other awesomeness.
-
Updated
May 7, 2024
A curated list of awesome big data frameworks, ressources and other awesomeness.
Open-Source Web UI for Apache Kafka Management
Fancy stream processing made operationally mundane
🌊 Online machine learning in Python
The data warehouse for operational workloads.
Pravega - Streaming as a new software defined storage primitive
Utils for streaming large files (S3, HDFS, gzip, bz2...)
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Lean and mean distributed stream processing system written in rust and web assembly.
A machine learning package for streaming data in Python. The other ancestor of River.
Source code for the Kafka Streams in Action Book
A list about Apache Kafka
A lightweight stream processing library for Go
Real-time stream processing for python
Trill is a single-node query processor for temporal or streaming data.
A real-time interactive web app based on data pipelines using streaming Twitter data, automated sentiment analysis, and MySQL&PostgreSQL database (Deployed on Heroku)
🌲 Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams
Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the hood, ReadySet caches the results of cached select statements and incrementally updates these results over time as the underlying data changes.
Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
Add a description, image, and links to the streaming-data topic page so that developers can more easily learn about it.
To associate your repository with the streaming-data topic, visit your repo's landing page and select "manage topics."