why LEFT JOIN when you can LEFTSHOVE?
An opinionated CDC utility. Captures incremental table snapshots by querying on windows of 'not modified since' timestamps. Automatic creation of Bigquery sink dataset/tables. Uses Benthos Streams under the hood for streaming data from source to sink. State stored locally with sqlite.
PostgreSQL
BigQuery
- automatic creation of dataset and tables (requires GCP Application Default Credentials with appropriate permissions)
- automatic creation of nms views showing current state
./leftshove -config=./sample.env -seed -bq -cdc
- config: path to environment variable file (required)
- seed: connect to source database and automatically collect source table information (default: false)
- bq: automatically create dataset(s)/table(s) in BigQuery matching the source table schema with compatible types (default: false)
- cdc: run change data capture (default: false)
- runonce: interate source tables only once (default: false)
- implement way to define exceptions for snapshot window field name (not_modified_since, nms, etc...); for now solution is to run mutation query in sqlite
- additional Benthos-supported outputs
- option for output to parquet file + S3/GCS
- option for output to BigQuery streaming/storage API insert
- option for BigQuery table partitioning at table creation
- option for output to any Benthos output
- handle source table name collisions; for now it is recommended to output each source to a separate BigQuery dataset
- fix Benthos logging to file