#

big-data

Here are 3,997 public repositories matching this topic...

risingwave

risingwavelabs / risingwave

SQL stream processing, analytics, and management. We decouple storage and compute to offer instant failover, dynamic scaling, speedy bootstrapping, and efficient joins.

Updated May 21, 2024
Rust

StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.

Updated May 21, 2024
Java

apache / ignite-3

Apache Ignite 3

iot cloud sql database big-data cache ignite network-server in-memory-database data-management-platform network-client distributed-sql-database in-memory-computing

Updated May 21, 2024
Java

ytsaurus / ytsaurus

YTsaurus is a scalable and fault-tolerant open-source big data platform.

sql big-data spark clickhouse distributed-database lakehouse olap-database ytsaurus

Updated May 21, 2024
C++

wLUOw / MA234_Course_Project

data-science big-data course-project sustech ma234

Updated May 21, 2024
Python

vespa

vespa-engine / vespa

AI + Data, online. https://vespa.ai

java search-engine machine-learning big-data ai server cpp tensorflow vespa serving serving-recommendation vector-search

Updated May 21, 2024
Java

apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

big-data spark flink real-time-analytics data-ingestion table-store paimon streaming-datalake

Updated May 21, 2024
Java

trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

java distributed-systems data-science sql database big-data presto hive hadoop analytics jdbc databases distributed-database query-engine iceberg datalake prestodb trino delta-lake

Updated May 21, 2024
Java

apache / ignite

Apache Ignite

iot cloud sql database big-data hadoop cache osgi ignite network-server in-memory-database data-management-platform network-client distributed-sql-database in-memory-computing

Updated May 21, 2024
Java

fangvv / Homepage

北京交通大学计算机与信息技术学院系统与网络实验室 https://fangvv.github.io/Homepage/

Updated May 21, 2024
JavaScript

hablapps / doric

Type safety for spark columns

scala big-data spark typesafe big dataframe spark-columns

Updated May 21, 2024
Scala

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

python java r scala sql big-data spark jdbc

Updated May 21, 2024
Scala

cloudbreak

hortonworks / cloudbreak

CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.

java cloud big-data hadoop deployment cloudera hacktoberfest

Updated May 21, 2024
Java

ingef / conquery

Visual, interactive queries against big databases

java big-data big-data-analytics

Updated May 21, 2024
Java

questdb

questdb / questdb

An open source time-series database for fast ingest and SQL queries

java iot postgres sql database big-data time-series analytics cpp grafana postgresql simd low-latency financial-analysis tsdb hacktoberfest time-series-database questdb

Updated May 21, 2024
Java

catboost / catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

python data-science machine-learning data-mining tutorial r big-data gpu cuda kaggle gbdt gbm gpu-computing decision-trees gradient-boosting coreml catboost categorical-features

Updated May 21, 2024
Python

AbsaOSS / pramen

Resilient data pipeline framework running on Apache Spark

scala big-data spark etl hacktoberfest data-pipeline

Updated May 21, 2024
Scala

apache / ozone

Scalable, redundant, and distributed object store for Apache Hadoop

kubernetes big-data hadoop storage s3 object-store

Updated May 21, 2024
Java

apache / flink

Apache Flink

python java scala sql big-data flink

Updated May 21, 2024
Java

flowman

dimajix / flowman

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

scala sql big-data spark apache-spark hadoop etl bigdata data-engineering flowman

Updated May 21, 2024
Scala

Improve this page

Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."