Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature matrix doc #915

Merged
merged 8 commits into from
Nov 22, 2024
Merged
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ transactional database tables in MySQL and PostgreSQL to ClickHouse
for analysis.

## Features
Refer [Feature Matrix](doc/feature_matrix.md) for detailed features.

* [Initial data dump and load(MySQL)](sink-connector/python/README.md)
* Change data capture of new transactions using [Debezium](https://debezium.io/)
Expand Down Expand Up @@ -61,6 +62,7 @@ First two are good tutorials on MySQL and PostgreSQL respectively.
* [Adding new tables(Incremental Snapshot)](doc/incremental_snapshot.md)
* [Configuration](doc/configuration.md)
* [State Storage](doc/state_storage.md)
* [Data Type Mapping](doc/data_types.md)

### Operations

Expand All @@ -72,6 +74,9 @@ First two are good tutorials on MySQL and PostgreSQL respectively.
* [Development](doc/development.md)
* [Testing](doc/TESTING.md)

## Comparison with other technologies
- [Comparison](doc/comparison.md)

## Roadmap

[2024 Roadmap](https://github.com/Altinity/clickhouse-sink-connector/issues/401)
Expand Down
20 changes: 20 additions & 0 deletions doc/comparison.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
| Feature | Altinity Sink Connector (Lightweight, Single Binary) | Airbyte | ClickHouse `mysql` Table Engine | Custom Python Script with ClickHouse Connect |
|---------------------------------|------------------------------------------------------|--------------------------------|----------------------------------------|-----------------------------------------------|
| **Replication Type** | Real-time CDC | Batch (Scheduled) | Direct Query | Batch or Scheduled |
| **Data Freshness** | Near real-time | Configurable (e.g., hourly) | Near real-time (with latency) | Configurable |
| **Schema Change Handling** | Full support(MySQL), Partial(PostgreSQL) | Manual schema refresh required | No automatic schema sync | Manual intervention needed |
| **Complexity** | Low to Medium (single binary setup) | Moderate | Low | High (requires coding and scheduling) |
| **Ease of Setup** | Easy (standalone binary, no Kafka needed) | Easy | Very easy | Complex (custom coding) |
| **Maintenance** | Low to Moderate (single binary process) | Low | Low | High |
| **Initial Sync Support** | Yes | Yes | Not applicable (direct query) | Yes |
| **Transformation Capabilities** | Limited | Basic (Airbyte transformations)| No | Full control (custom code) |
| **Cost** | Free or license-based | Free (Open-source) | Free (built-in to ClickHouse) | Free (but may require custom infrastructure) |
| **Suitability for High Volume** | High | Medium | Medium | Medium to Low |
| **Additional Infrastructure** | None | None | None | Optional (scheduling tools like Airflow) |
| **Data Accuracy** | High (real-time CDC) | Medium (depends on sync frequency) | Medium | High |
| **Ideal Use Case** | Low-latency, real-time replication without Kafka | Batch syncs, easy setup | Simple queries without replication | Custom, flexible ETL |


| Feature | Altinity Sink Connector (Lightweight, Single Binary) | Airbyte |
|---------------------------------|------------------------------------------------------|--------------------------------|
|
76 changes: 76 additions & 0 deletions doc/data_types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
## MySQL Data Types
Refer [Debezium](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-supported-data-types) for detailed data types.

| MySQL | Debezium | ClickHouse |
|--------------------|------------------------------------------------------|---------------------------------|
| Bigint | INT64\_SCHEMA | Int64 |
| Bigint Unsigned | INT64\_SCHEMA | UInt64 |
| Blob | | String + hex |
| Char | String | String / LowCardinality(String) |
| Date | Schema: INT64<br>Name:<br>debezium.Date | Date(6) |
| DateTime(0/1/2/3) | Schema: INT64<br>Name: debezium.Timestamp | DateTime64(0/1/2/3) |
| DateTime(4/5/6) | Schema: INT64<br>Name: debezium.MicroTimestamp | DateTime64(4/5/6) |
| Decimal(30,12) | Schema: Bytes<br>Name:<br>kafka.connect.data.Decimal | Decimal(30,12) |
| Double | | Float64 |
| Int | INT32 | Int32 |
| Int Unsigned | INT64 | UInt32 |
| Longblob | | String + hex |
| Mediumblob | | String + hex |
| Mediumint | INT32 | Int32 |
| Mediumint Unsigned | INT32 | UInt32 |
| Smallint | INT16 | Int16 |
| Smallint Unsigned | INT32 | UInt16 |
| Text | String | String |
| Time | | String |
| Time(6) | | String |
| Timestamp | | DateTime64 |
| Tinyint | INT16 | Int8 |
| Tinyint Unsigned | INT16 | UInt8 |
| varbinary(\*) | | String + hex |
| varchar(\*) | | String |
| JSON | | String |
| BYTES | BYTES, io.debezium.bits | String |
| YEAR | INT32 | INT32 |
| GEOMETRY | Binary of WKB | String |
| SET | | Array(String) |
| ENUM | | Array(String) |


### PostgreSQL Data Types

| PostgreSQL Type | Notes |
|---------------------------|---------------------------------------------------------------------------------------|
| `SMALLINT` | |
| `INTEGER` | Supported |
| `BIGINT` | Supported |
| `NUMERIC` | Supported |
| `REAL` | Supported |
| `DOUBLE PRECISION` | Supported |
| `BOOLEAN` | Supported |
| `CHAR(n)` | Supported |
| `VARCHAR(n)` | Supported |
| `TEXT` | Supported |
| `BYTEA` | Supported |
| `DATE` | Supported |
| `TIME [ WITHOUT TIME ZONE ]` | Supported |
| `TIME WITH TIME ZONE` | Supported |
| `TIMESTAMP [ WITHOUT TIME ZONE ]` | Supported |
| `TIMESTAMP WITH TIME ZONE` | Supported |
| `INTERVAL` | Supported |
| `UUID` | Supported |
| `INET` | Supported |
| `MACADDR` | Supported |
| `JSON` | Supported |
| `JSONB` | Supported |
| `HSTORE` | Supported |
| `ENUM` | Supported |
| `ARRAY` | Supported, but arrays of unsupported types are not supported |
| `GEOMETRY` (PostGIS) | Not supported |
| `GEOGRAPHY` (PostGIS) | Not supported |
| `CITEXT` | Supported |
| `BIT` | Not supported |
| `BIT VARYING` | Not supported |
| `MONEY` | Not supported |
| `XML` | Not supported |
| `OID` | Not supported |
| `UNSUPPORTED` | Types other than those listed are not supported |
26 changes: 26 additions & 0 deletions doc/feature_matrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
## Features

| Feature | Description |
| ------- | --------- |
| Single Binary | No additional dependencies or infrastructure required |
| Exactly Once Processing| Offsets are committed to ClickHouse after the messages are written to ClickHouse |
| Supported Databases | MySQL, MariaDB, PostgreSQL, MongoDB(Experimental) |
| Supported ClickHouse Versions | 24.8 and above |
| Clickhouse Tables Types | ReplacingMergeTree, MergeTree, ReplicatedReplacingMergeTree |
| Replication Start positioning | Using sink-connector-client to start replication from a specific offset or LSN(MySQL Binlog Position, PostgreSQL LSN) |
| Supported Datatypes| Refer [Datatypes](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-supported-data-types) |
| Initial Data load | Scripts to perform initial data load (MySQL) |
| Fault Tolerance | Sink Connector Client to continue replication from the last committed offset/LSN in case of a failure |
| Update, Delete | Supported with ReplacingMergeTree
| Monitoring | Prometheus Metrics, Grafana Dashboard |
| Schema Evolution| DDL support for MYSQL.
| Deployment Models| Docker Compose, Java JAR file, Kubernetes
| Start, Stop, Pause, Resume Replication | Supported using sink-connector-client
| Filter sources databases, tables, columns | Supported using debezium configuration.
| Map source databases to different ClickHouse databases | Database name overrides supported.
| Column name overrides | Planned
| MySQL extensive DDL support | Full list of DDL(sink-connector-lightweight/docs/mysql-ddl-support.md)
| Replication Lag Monitoring| Grafana Dashboard and view to monitor lag
| Batch inserts to ClickHouse | Configurable batch size/thread pool size to achieve high throughput/low latency
| MySQL Generated/Alias/Materialized Columns | Supported
| Auto create tables| Tables are automatically created in ClickHouse based on the source table structure.
Loading