A curated list of awesome big data frameworks, ressources and other awesomeness.
-
Updated
Apr 19, 2024
A curated list of awesome big data frameworks, ressources and other awesomeness.
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Open-Source Web UI for Apache Kafka Management
Fancy stream processing made operationally mundane
The data warehouse for operational workloads.
🌊 Online machine learning in Python
Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the hood, ReadySet caches the results of cached select statements and incrementally updates these results over time as the underlying data changes.
Utils for streaming large files (S3, HDFS, gzip, bz2...)
Lean and mean distributed stream processing system written in rust and web assembly.
Open-source graph database, tuned for dynamic analytics environments. Easy to adopt, scale and own.
Pravega - Streaming as a new software defined storage primitive
A lightweight stream processing library for Go
Trill is a single-node query processor for temporal or streaming data.
Real-time stream processing for python
Python Stream Processing
📐 Pushing the boundaries of simplicity
⚡ Single-pass algorithms for statistics
A machine learning package for streaming data in Python. The other ancestor of River.
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
A list about Apache Kafka
Add a description, image, and links to the streaming-data topic page so that developers can more easily learn about it.
To associate your repository with the streaming-data topic, visit your repo's landing page and select "manage topics."