diff --git a/README.md b/README.md index 92f20442a..b83b1bda9 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ transactional database tables in MySQL and PostgreSQL to ClickHouse for analysis. ## Features +Refer [Feature Matrix](doc/feature_matrix.md) for detailed features. * [Initial data dump and load(MySQL)](sink-connector/python/README.md) * Change data capture of new transactions using [Debezium](https://debezium.io/) @@ -61,6 +62,7 @@ First two are good tutorials on MySQL and PostgreSQL respectively. * [Adding new tables(Incremental Snapshot)](doc/incremental_snapshot.md) * [Configuration](doc/configuration.md) * [State Storage](doc/state_storage.md) +* [Data Type Mapping](doc/data_types.md) ### Operations @@ -72,6 +74,9 @@ First two are good tutorials on MySQL and PostgreSQL respectively. * [Development](doc/development.md) * [Testing](doc/TESTING.md) +## Comparison with other technologies +- [Comparison](doc/comparison.md) + ## Roadmap [2024 Roadmap](https://github.com/Altinity/clickhouse-sink-connector/issues/401) diff --git a/doc/comparison.md b/doc/comparison.md new file mode 100644 index 000000000..d434309fa --- /dev/null +++ b/doc/comparison.md @@ -0,0 +1,20 @@ +| Feature | Altinity Sink Connector (Lightweight, Single Binary) | Airbyte | ClickHouse `mysql` Table Engine | Custom Python Script with ClickHouse Connect | +|---------------------------------|------------------------------------------------------|--------------------------------|----------------------------------------|-----------------------------------------------| +| **Replication Type** | Real-time CDC | Batch (Scheduled) | Direct Query | Batch or Scheduled | +| **Data Freshness** | Near real-time | Configurable (e.g., hourly) | Near real-time (with latency) | Configurable | +| **Schema Change Handling** | Full support(MySQL), Partial(PostgreSQL) | Manual schema refresh required | No automatic schema sync | Manual intervention needed | +| **Complexity** | Low to Medium (single binary setup) | Moderate | Low | High (requires coding and scheduling) | +| **Ease of Setup** | Easy (standalone binary, no Kafka needed) | Easy | Very easy | Complex (custom coding) | +| **Maintenance** | Low to Moderate (single binary process) | Low | Low | High | +| **Initial Sync Support** | Yes | Yes | Not applicable (direct query) | Yes | +| **Transformation Capabilities** | Limited | Basic (Airbyte transformations)| No | Full control (custom code) | +| **Cost** | Free or license-based | Free (Open-source) | Free (built-in to ClickHouse) | Free (but may require custom infrastructure) | +| **Suitability for High Volume** | High | Medium | Medium | Medium to Low | +| **Additional Infrastructure** | None | None | None | Optional (scheduling tools like Airflow) | +| **Data Accuracy** | High (real-time CDC) | Medium (depends on sync frequency) | Medium | High | +| **Ideal Use Case** | Low-latency, real-time replication without Kafka | Batch syncs, easy setup | Simple queries without replication | Custom, flexible ETL | + + +| Feature | Altinity Sink Connector (Lightweight, Single Binary) | Airbyte | +|---------------------------------|------------------------------------------------------|--------------------------------| +| \ No newline at end of file diff --git a/doc/data_types.md b/doc/data_types.md new file mode 100644 index 000000000..cab08f7d9 --- /dev/null +++ b/doc/data_types.md @@ -0,0 +1,76 @@ +## MySQL Data Types +Refer [Debezium](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-supported-data-types) for detailed data types. + +| MySQL | Debezium | ClickHouse | +|--------------------|------------------------------------------------------|---------------------------------| +| Bigint | INT64\_SCHEMA | Int64 | +| Bigint Unsigned | INT64\_SCHEMA | UInt64 | +| Blob | | String + hex | +| Char | String | String / LowCardinality(String) | +| Date | Schema: INT64
Name:
debezium.Date | Date(6) | +| DateTime(0/1/2/3) | Schema: INT64
Name: debezium.Timestamp | DateTime64(0/1/2/3) | +| DateTime(4/5/6) | Schema: INT64
Name: debezium.MicroTimestamp | DateTime64(4/5/6) | +| Decimal(30,12) | Schema: Bytes
Name:
kafka.connect.data.Decimal | Decimal(30,12) | +| Double | | Float64 | +| Int | INT32 | Int32 | +| Int Unsigned | INT64 | UInt32 | +| Longblob | | String + hex | +| Mediumblob | | String + hex | +| Mediumint | INT32 | Int32 | +| Mediumint Unsigned | INT32 | UInt32 | +| Smallint | INT16 | Int16 | +| Smallint Unsigned | INT32 | UInt16 | +| Text | String | String | +| Time | | String | +| Time(6) | | String | +| Timestamp | | DateTime64 | +| Tinyint | INT16 | Int8 | +| Tinyint Unsigned | INT16 | UInt8 | +| varbinary(\*) | | String + hex | +| varchar(\*) | | String | +| JSON | | String | +| BYTES | BYTES, io.debezium.bits | String | +| YEAR | INT32 | INT32 | +| GEOMETRY | Binary of WKB | String | +| SET | | Array(String) | +| ENUM | | Array(String) | + + +### PostgreSQL Data Types + +| PostgreSQL Type | Notes | +|---------------------------|---------------------------------------------------------------------------------------| +| `SMALLINT` | | +| `INTEGER` | Supported | +| `BIGINT` | Supported | +| `NUMERIC` | Supported | +| `REAL` | Supported | +| `DOUBLE PRECISION` | Supported | +| `BOOLEAN` | Supported | +| `CHAR(n)` | Supported | +| `VARCHAR(n)` | Supported | +| `TEXT` | Supported | +| `BYTEA` | Supported | +| `DATE` | Supported | +| `TIME [ WITHOUT TIME ZONE ]` | Supported | +| `TIME WITH TIME ZONE` | Supported | +| `TIMESTAMP [ WITHOUT TIME ZONE ]` | Supported | +| `TIMESTAMP WITH TIME ZONE` | Supported | +| `INTERVAL` | Supported | +| `UUID` | Supported | +| `INET` | Supported | +| `MACADDR` | Supported | +| `JSON` | Supported | +| `JSONB` | Supported | +| `HSTORE` | Supported | +| `ENUM` | Supported | +| `ARRAY` | Supported, but arrays of unsupported types are not supported | +| `GEOMETRY` (PostGIS) | Not supported | +| `GEOGRAPHY` (PostGIS) | Not supported | +| `CITEXT` | Supported | +| `BIT` | Not supported | +| `BIT VARYING` | Not supported | +| `MONEY` | Not supported | +| `XML` | Not supported | +| `OID` | Not supported | +| `UNSUPPORTED` | Types other than those listed are not supported | diff --git a/doc/feature_matrix.md b/doc/feature_matrix.md new file mode 100644 index 000000000..1428c8fe3 --- /dev/null +++ b/doc/feature_matrix.md @@ -0,0 +1,26 @@ +## Features + +| Feature | Description | +| ------- | --------- | +| Single Binary | No additional dependencies or infrastructure required | +| Exactly Once Processing| Offsets are committed to ClickHouse after the messages are written to ClickHouse | +| Supported Databases | MySQL, MariaDB, PostgreSQL, MongoDB(Experimental) | +| Supported ClickHouse Versions | 24.8 and above | +| Clickhouse Tables Types | ReplacingMergeTree, MergeTree, ReplicatedReplacingMergeTree | +| Replication Start positioning | Using sink-connector-client to start replication from a specific offset or LSN(MySQL Binlog Position, PostgreSQL LSN) | +| Supported Datatypes| Refer [Datatypes](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-supported-data-types) | +| Initial Data load | Scripts to perform initial data load (MySQL) | +| Fault Tolerance | Sink Connector Client to continue replication from the last committed offset/LSN in case of a failure | +| Update, Delete | Supported with ReplacingMergeTree +| Monitoring | Prometheus Metrics, Grafana Dashboard | +| Schema Evolution| DDL support for MYSQL. +| Deployment Models| Docker Compose, Java JAR file, Kubernetes +| Start, Stop, Pause, Resume Replication | Supported using sink-connector-client +| Filter sources databases, tables, columns | Supported using debezium configuration. +| Map source databases to different ClickHouse databases | Database name overrides supported. +| Column name overrides | Planned +| MySQL extensive DDL support | Full list of DDL(sink-connector-lightweight/docs/mysql-ddl-support.md) +| Replication Lag Monitoring| Grafana Dashboard and view to monitor lag +| Batch inserts to ClickHouse | Configurable batch size/thread pool size to achieve high throughput/low latency +| MySQL Generated/Alias/Materialized Columns | Supported +| Auto create tables| Tables are automatically created in ClickHouse based on the source table structure.