Skip to content

Commit

Permalink
Merge pull request #915 from Altinity/add_feature_matrix_doc
Browse files Browse the repository at this point in the history
Add feature matrix doc
  • Loading branch information
subkanthi authored Nov 22, 2024
2 parents c0980f3 + bf93dbc commit 33a16ad
Show file tree
Hide file tree
Showing 4 changed files with 127 additions and 0 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ transactional database tables in MySQL and PostgreSQL to ClickHouse
for analysis.

## Features
Refer [Feature Matrix](doc/feature_matrix.md) for detailed features.

* [Initial data dump and load(MySQL)](sink-connector/python/README.md)
* Change data capture of new transactions using [Debezium](https://debezium.io/)
Expand Down Expand Up @@ -61,6 +62,7 @@ First two are good tutorials on MySQL and PostgreSQL respectively.
* [Adding new tables(Incremental Snapshot)](doc/incremental_snapshot.md)
* [Configuration](doc/configuration.md)
* [State Storage](doc/state_storage.md)
* [Data Type Mapping](doc/data_types.md)

### Operations

Expand All @@ -72,6 +74,9 @@ First two are good tutorials on MySQL and PostgreSQL respectively.
* [Development](doc/development.md)
* [Testing](doc/TESTING.md)

## Comparison with other technologies
- [Comparison](doc/comparison.md)

## Roadmap

[2024 Roadmap](https://github.com/Altinity/clickhouse-sink-connector/issues/401)
Expand Down
20 changes: 20 additions & 0 deletions doc/comparison.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
| Feature | Altinity Sink Connector (Lightweight, Single Binary) | Airbyte | ClickHouse `mysql` Table Engine | Custom Python Script with ClickHouse Connect |
|---------------------------------|------------------------------------------------------|--------------------------------|----------------------------------------|-----------------------------------------------|
| **Replication Type** | Real-time CDC | Batch (Scheduled) | Direct Query | Batch or Scheduled |
| **Data Freshness** | Near real-time | Configurable (e.g., hourly) | Near real-time (with latency) | Configurable |
| **Schema Change Handling** | Full support(MySQL), Partial(PostgreSQL) | Manual schema refresh required | No automatic schema sync | Manual intervention needed |
| **Complexity** | Low to Medium (single binary setup) | Moderate | Low | High (requires coding and scheduling) |
| **Ease of Setup** | Easy (standalone binary, no Kafka needed) | Easy | Very easy | Complex (custom coding) |
| **Maintenance** | Low to Moderate (single binary process) | Low | Low | High |
| **Initial Sync Support** | Yes | Yes | Not applicable (direct query) | Yes |
| **Transformation Capabilities** | Limited | Basic (Airbyte transformations)| No | Full control (custom code) |
| **Cost** | Free or license-based | Free (Open-source) | Free (built-in to ClickHouse) | Free (but may require custom infrastructure) |
| **Suitability for High Volume** | High | Medium | Medium | Medium to Low |
| **Additional Infrastructure** | None | None | None | Optional (scheduling tools like Airflow) |
| **Data Accuracy** | High (real-time CDC) | Medium (depends on sync frequency) | Medium | High |
| **Ideal Use Case** | Low-latency, real-time replication without Kafka | Batch syncs, easy setup | Simple queries without replication | Custom, flexible ETL |


| Feature | Altinity Sink Connector (Lightweight, Single Binary) | Airbyte |
|---------------------------------|------------------------------------------------------|--------------------------------|
|
76 changes: 76 additions & 0 deletions doc/data_types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
## MySQL Data Types
Refer [Debezium](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-supported-data-types) for detailed data types.

| MySQL | Debezium | ClickHouse |
|--------------------|------------------------------------------------------|---------------------------------|
| Bigint | INT64\_SCHEMA | Int64 |
| Bigint Unsigned | INT64\_SCHEMA | UInt64 |
| Blob | | String + hex |
| Char | String | String / LowCardinality(String) |
| Date | Schema: INT64<br>Name:<br>debezium.Date | Date(6) |
| DateTime(0/1/2/3) | Schema: INT64<br>Name: debezium.Timestamp | DateTime64(0/1/2/3) |
| DateTime(4/5/6) | Schema: INT64<br>Name: debezium.MicroTimestamp | DateTime64(4/5/6) |
| Decimal(30,12) | Schema: Bytes<br>Name:<br>kafka.connect.data.Decimal | Decimal(30,12) |
| Double | | Float64 |
| Int | INT32 | Int32 |
| Int Unsigned | INT64 | UInt32 |
| Longblob | | String + hex |
| Mediumblob | | String + hex |
| Mediumint | INT32 | Int32 |
| Mediumint Unsigned | INT32 | UInt32 |
| Smallint | INT16 | Int16 |
| Smallint Unsigned | INT32 | UInt16 |
| Text | String | String |
| Time | | String |
| Time(6) | | String |
| Timestamp | | DateTime64 |
| Tinyint | INT16 | Int8 |
| Tinyint Unsigned | INT16 | UInt8 |
| varbinary(\*) | | String + hex |
| varchar(\*) | | String |
| JSON | | String |
| BYTES | BYTES, io.debezium.bits | String |
| YEAR | INT32 | INT32 |
| GEOMETRY | Binary of WKB | String |
| SET | | Array(String) |
| ENUM | | Array(String) |


### PostgreSQL Data Types

| PostgreSQL Type | Notes |
|---------------------------|---------------------------------------------------------------------------------------|
| `SMALLINT` | |
| `INTEGER` | Supported |
| `BIGINT` | Supported |
| `NUMERIC` | Supported |
| `REAL` | Supported |
| `DOUBLE PRECISION` | Supported |
| `BOOLEAN` | Supported |
| `CHAR(n)` | Supported |
| `VARCHAR(n)` | Supported |
| `TEXT` | Supported |
| `BYTEA` | Supported |
| `DATE` | Supported |
| `TIME [ WITHOUT TIME ZONE ]` | Supported |
| `TIME WITH TIME ZONE` | Supported |
| `TIMESTAMP [ WITHOUT TIME ZONE ]` | Supported |
| `TIMESTAMP WITH TIME ZONE` | Supported |
| `INTERVAL` | Supported |
| `UUID` | Supported |
| `INET` | Supported |
| `MACADDR` | Supported |
| `JSON` | Supported |
| `JSONB` | Supported |
| `HSTORE` | Supported |
| `ENUM` | Supported |
| `ARRAY` | Supported, but arrays of unsupported types are not supported |
| `GEOMETRY` (PostGIS) | Not supported |
| `GEOGRAPHY` (PostGIS) | Not supported |
| `CITEXT` | Supported |
| `BIT` | Not supported |
| `BIT VARYING` | Not supported |
| `MONEY` | Not supported |
| `XML` | Not supported |
| `OID` | Not supported |
| `UNSUPPORTED` | Types other than those listed are not supported |
26 changes: 26 additions & 0 deletions doc/feature_matrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
## Features

| Feature | Description |
| ------- | --------- |
| Single Binary | No additional dependencies or infrastructure required |
| Exactly Once Processing| Offsets are committed to ClickHouse after the messages are written to ClickHouse |
| Supported Databases | MySQL, MariaDB, PostgreSQL, MongoDB(Experimental) |
| Supported ClickHouse Versions | 24.8 and above |
| Clickhouse Tables Types | ReplacingMergeTree, MergeTree, ReplicatedReplacingMergeTree |
| Replication Start positioning | Using sink-connector-client to start replication from a specific offset or LSN(MySQL Binlog Position, PostgreSQL LSN) |
| Supported Datatypes| Refer [Datatypes](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-supported-data-types) |
| Initial Data load | Scripts to perform initial data load (MySQL) |
| Fault Tolerance | Sink Connector Client to continue replication from the last committed offset/LSN in case of a failure |
| Update, Delete | Supported with ReplacingMergeTree
| Monitoring | Prometheus Metrics, Grafana Dashboard |
| Schema Evolution| DDL support for MYSQL.
| Deployment Models| Docker Compose, Java JAR file, Kubernetes
| Start, Stop, Pause, Resume Replication | Supported using sink-connector-client
| Filter sources databases, tables, columns | Supported using debezium configuration.
| Map source databases to different ClickHouse databases | Database name overrides supported.
| Column name overrides | Planned
| MySQL extensive DDL support | Full list of DDL(sink-connector-lightweight/docs/mysql-ddl-support.md)
| Replication Lag Monitoring| Grafana Dashboard and view to monitor lag
| Batch inserts to ClickHouse | Configurable batch size/thread pool size to achieve high throughput/low latency
| MySQL Generated/Alias/Materialized Columns | Supported
| Auto create tables| Tables are automatically created in ClickHouse based on the source table structure.

0 comments on commit 33a16ad

Please sign in to comment.