From 5b7528a9865cb8c2910cc863ae2b9630f204216e Mon Sep 17 00:00:00 2001 From: Kanthi Subramanian Date: Mon, 11 Nov 2024 19:58:50 -0500 Subject: [PATCH 1/7] Added feature matrix documentation --- doc/feature_matrix.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 doc/feature_matrix.md diff --git a/doc/feature_matrix.md b/doc/feature_matrix.md new file mode 100644 index 000000000..1428c8fe3 --- /dev/null +++ b/doc/feature_matrix.md @@ -0,0 +1,26 @@ +## Features + +| Feature | Description | +| ------- | --------- | +| Single Binary | No additional dependencies or infrastructure required | +| Exactly Once Processing| Offsets are committed to ClickHouse after the messages are written to ClickHouse | +| Supported Databases | MySQL, MariaDB, PostgreSQL, MongoDB(Experimental) | +| Supported ClickHouse Versions | 24.8 and above | +| Clickhouse Tables Types | ReplacingMergeTree, MergeTree, ReplicatedReplacingMergeTree | +| Replication Start positioning | Using sink-connector-client to start replication from a specific offset or LSN(MySQL Binlog Position, PostgreSQL LSN) | +| Supported Datatypes| Refer [Datatypes](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-supported-data-types) | +| Initial Data load | Scripts to perform initial data load (MySQL) | +| Fault Tolerance | Sink Connector Client to continue replication from the last committed offset/LSN in case of a failure | +| Update, Delete | Supported with ReplacingMergeTree +| Monitoring | Prometheus Metrics, Grafana Dashboard | +| Schema Evolution| DDL support for MYSQL. +| Deployment Models| Docker Compose, Java JAR file, Kubernetes +| Start, Stop, Pause, Resume Replication | Supported using sink-connector-client +| Filter sources databases, tables, columns | Supported using debezium configuration. +| Map source databases to different ClickHouse databases | Database name overrides supported. +| Column name overrides | Planned +| MySQL extensive DDL support | Full list of DDL(sink-connector-lightweight/docs/mysql-ddl-support.md) +| Replication Lag Monitoring| Grafana Dashboard and view to monitor lag +| Batch inserts to ClickHouse | Configurable batch size/thread pool size to achieve high throughput/low latency +| MySQL Generated/Alias/Materialized Columns | Supported +| Auto create tables| Tables are automatically created in ClickHouse based on the source table structure. From 67f3934b7e0913169766815e00f20e1fa8001dd7 Mon Sep 17 00:00:00 2001 From: Kanthi Subramanian Date: Mon, 11 Nov 2024 20:00:21 -0500 Subject: [PATCH 2/7] Aded link to feature matrix in main README --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 031ae7bce..24cf3ab4d 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ transactional database tables in MySQL and PostgreSQL to ClickHouse for analysis. ## Features +Refer [Feature Matrix](doc/feature_matrix.md) for detailed features. * [Initial data dump and load(MySQL)](sink-connector/python/README.md) * Change data capture of new transactions using [Debezium](https://debezium.io/) From 6dbcd9e5384257f703d62323fc787547476e5ed9 Mon Sep 17 00:00:00 2001 From: Kanthi Subramanian Date: Mon, 11 Nov 2024 21:25:48 -0500 Subject: [PATCH 3/7] Added comparison document to compare other technologies --- doc/comparison.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 doc/comparison.md diff --git a/doc/comparison.md b/doc/comparison.md new file mode 100644 index 000000000..572068822 --- /dev/null +++ b/doc/comparison.md @@ -0,0 +1,15 @@ +| Feature | Altinity Sink Connector (Lightweight, Single Binary) | Airbyte | ClickHouse `mysql` Table Engine | Custom Python Script with ClickHouse Connect | +|---------------------------------|------------------------------------------------------|--------------------------------|----------------------------------------|-----------------------------------------------| +| **Replication Type** | Real-time CDC | Batch (Scheduled) | Direct Query | Batch or Scheduled | +| **Data Freshness** | Near real-time | Configurable (e.g., hourly) | Near real-time (with latency) | Configurable | +| **Schema Change Handling** | Partial support, some manual config may be required | Manual schema refresh required | No automatic schema sync | Manual intervention needed | +| **Complexity** | Low to Medium (single binary setup) | Moderate | Low | High (requires coding and scheduling) | +| **Ease of Setup** | Easy (standalone binary, no Kafka needed) | Easy | Very easy | Complex (custom coding) | +| **Maintenance** | Low to Moderate (single binary process) | Low | Low | High | +| **Initial Sync Support** | Yes | Yes | Not applicable (direct query) | Yes | +| **Transformation Capabilities** | Limited | Basic (Airbyte transformations)| No | Full control (custom code) | +| **Cost** | Free or license-based | Free (Open-source) | Free (built-in to ClickHouse) | Free (but may require custom infrastructure) | +| **Suitability for High Volume** | High | Medium | Medium | Medium to Low | +| **Additional Infrastructure** | None | None | None | Optional (scheduling tools like Airflow) | +| **Data Accuracy** | High (real-time CDC) | Medium (depends on sync frequency) | Medium | High | +| **Ideal Use Case** | Low-latency, real-time replication without Kafka | Batch syncs, easy setup | Simple queries without replication | Custom, flexible ETL | From 2d828d37ca78722fd23e529da45ef72634d34c1b Mon Sep 17 00:00:00 2001 From: Kanthi Subramanian Date: Mon, 11 Nov 2024 21:27:11 -0500 Subject: [PATCH 4/7] Added comparison document to compare other technologies --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 24cf3ab4d..1028efe4f 100644 --- a/README.md +++ b/README.md @@ -73,6 +73,9 @@ First two are good tutorials on MySQL and PostgreSQL respectively. * [Development](doc/development.md) * [Testing](doc/TESTING.md) +## Comparison with other technologies +- [Comparison](doc/comparison.md) + ## Roadmap [2024 Roadmap](https://github.com/Altinity/clickhouse-sink-connector/issues/401) From 361cd4ec8be5e5cddb893ee03efaaed68badc0e2 Mon Sep 17 00:00:00 2001 From: Kanthi Subramanian Date: Tue, 12 Nov 2024 17:55:21 -0500 Subject: [PATCH 5/7] Added document for data types --- doc/comparison.md | 2 +- doc/data_types.md | 76 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 77 insertions(+), 1 deletion(-) create mode 100644 doc/data_types.md diff --git a/doc/comparison.md b/doc/comparison.md index 572068822..16bac83ac 100644 --- a/doc/comparison.md +++ b/doc/comparison.md @@ -2,7 +2,7 @@ |---------------------------------|------------------------------------------------------|--------------------------------|----------------------------------------|-----------------------------------------------| | **Replication Type** | Real-time CDC | Batch (Scheduled) | Direct Query | Batch or Scheduled | | **Data Freshness** | Near real-time | Configurable (e.g., hourly) | Near real-time (with latency) | Configurable | -| **Schema Change Handling** | Partial support, some manual config may be required | Manual schema refresh required | No automatic schema sync | Manual intervention needed | +| **Schema Change Handling** | Full support(MySQL), Partial(PostgreSQL) | Manual schema refresh required | No automatic schema sync | Manual intervention needed | | **Complexity** | Low to Medium (single binary setup) | Moderate | Low | High (requires coding and scheduling) | | **Ease of Setup** | Easy (standalone binary, no Kafka needed) | Easy | Very easy | Complex (custom coding) | | **Maintenance** | Low to Moderate (single binary process) | Low | Low | High | diff --git a/doc/data_types.md b/doc/data_types.md new file mode 100644 index 000000000..cab08f7d9 --- /dev/null +++ b/doc/data_types.md @@ -0,0 +1,76 @@ +## MySQL Data Types +Refer [Debezium](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-supported-data-types) for detailed data types. + +| MySQL | Debezium | ClickHouse | +|--------------------|------------------------------------------------------|---------------------------------| +| Bigint | INT64\_SCHEMA | Int64 | +| Bigint Unsigned | INT64\_SCHEMA | UInt64 | +| Blob | | String + hex | +| Char | String | String / LowCardinality(String) | +| Date | Schema: INT64
Name:
debezium.Date | Date(6) | +| DateTime(0/1/2/3) | Schema: INT64
Name: debezium.Timestamp | DateTime64(0/1/2/3) | +| DateTime(4/5/6) | Schema: INT64
Name: debezium.MicroTimestamp | DateTime64(4/5/6) | +| Decimal(30,12) | Schema: Bytes
Name:
kafka.connect.data.Decimal | Decimal(30,12) | +| Double | | Float64 | +| Int | INT32 | Int32 | +| Int Unsigned | INT64 | UInt32 | +| Longblob | | String + hex | +| Mediumblob | | String + hex | +| Mediumint | INT32 | Int32 | +| Mediumint Unsigned | INT32 | UInt32 | +| Smallint | INT16 | Int16 | +| Smallint Unsigned | INT32 | UInt16 | +| Text | String | String | +| Time | | String | +| Time(6) | | String | +| Timestamp | | DateTime64 | +| Tinyint | INT16 | Int8 | +| Tinyint Unsigned | INT16 | UInt8 | +| varbinary(\*) | | String + hex | +| varchar(\*) | | String | +| JSON | | String | +| BYTES | BYTES, io.debezium.bits | String | +| YEAR | INT32 | INT32 | +| GEOMETRY | Binary of WKB | String | +| SET | | Array(String) | +| ENUM | | Array(String) | + + +### PostgreSQL Data Types + +| PostgreSQL Type | Notes | +|---------------------------|---------------------------------------------------------------------------------------| +| `SMALLINT` | | +| `INTEGER` | Supported | +| `BIGINT` | Supported | +| `NUMERIC` | Supported | +| `REAL` | Supported | +| `DOUBLE PRECISION` | Supported | +| `BOOLEAN` | Supported | +| `CHAR(n)` | Supported | +| `VARCHAR(n)` | Supported | +| `TEXT` | Supported | +| `BYTEA` | Supported | +| `DATE` | Supported | +| `TIME [ WITHOUT TIME ZONE ]` | Supported | +| `TIME WITH TIME ZONE` | Supported | +| `TIMESTAMP [ WITHOUT TIME ZONE ]` | Supported | +| `TIMESTAMP WITH TIME ZONE` | Supported | +| `INTERVAL` | Supported | +| `UUID` | Supported | +| `INET` | Supported | +| `MACADDR` | Supported | +| `JSON` | Supported | +| `JSONB` | Supported | +| `HSTORE` | Supported | +| `ENUM` | Supported | +| `ARRAY` | Supported, but arrays of unsupported types are not supported | +| `GEOMETRY` (PostGIS) | Not supported | +| `GEOGRAPHY` (PostGIS) | Not supported | +| `CITEXT` | Supported | +| `BIT` | Not supported | +| `BIT VARYING` | Not supported | +| `MONEY` | Not supported | +| `XML` | Not supported | +| `OID` | Not supported | +| `UNSUPPORTED` | Types other than those listed are not supported | From af488d30a8318e2830b6446aa5a4ccf160ac2be1 Mon Sep 17 00:00:00 2001 From: Kanthi Subramanian Date: Tue, 12 Nov 2024 18:00:57 -0500 Subject: [PATCH 6/7] Added data types link to main document --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1028efe4f..a344ece07 100644 --- a/README.md +++ b/README.md @@ -62,6 +62,7 @@ First two are good tutorials on MySQL and PostgreSQL respectively. * [Adding new tables(Incremental Snapshot)](doc/incremental_snapshot.md) * [Configuration](doc/configuration.md) * [State Storage](doc/state_storage.md) +* [Data Type Mapping](doc/data_types.md) ### Operations From bf93dbc9fb9c2604e7da31c220b372a9fa7d21f5 Mon Sep 17 00:00:00 2001 From: Kanthi Subramanian Date: Mon, 18 Nov 2024 11:25:06 -0500 Subject: [PATCH 7/7] Added tech comparison --- doc/comparison.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/comparison.md b/doc/comparison.md index 16bac83ac..d434309fa 100644 --- a/doc/comparison.md +++ b/doc/comparison.md @@ -13,3 +13,8 @@ | **Additional Infrastructure** | None | None | None | Optional (scheduling tools like Airflow) | | **Data Accuracy** | High (real-time CDC) | Medium (depends on sync frequency) | Medium | High | | **Ideal Use Case** | Low-latency, real-time replication without Kafka | Batch syncs, easy setup | Simple queries without replication | Custom, flexible ETL | + + +| Feature | Altinity Sink Connector (Lightweight, Single Binary) | Airbyte | +|---------------------------------|------------------------------------------------------|--------------------------------| +| \ No newline at end of file