Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clickhouse_loader.py : fixing temporal and binary data types #338

Merged
merged 5 commits into from
Nov 3, 2023

Conversation

aadant
Copy link
Collaborator

@aadant aadant commented Oct 24, 2023

This fixes the checksums for temporal types for timezone != UTC

MySQL timestamps are converted to DateTime64 in the client timezone.
MySQL DateTime are converted as String, so that they are converted in the server timezone.

For the migration to be consistent, the source and target timezones should be identical. It should not matter for timestamps but it could cause conversion issues for DateTime.

time python db_load/clickhouse_loader.py --clickhouse_host $CH_HOST--clickhouse_database data_types --dump_dir /home/aadant/dbdumps/data_types --clickhouse_user $CH_USER --clickhouse_password $CH_PASS --clickhouse_port 9000 --threads 16  --mysql_source_database data_types --mysqlshell  --rmt_delete_support

test : 

python db_compare/mysql_table_checksum.py --mysql_host $MYSQL_HOST --mysql_user $MYSQL_USER --mysql_password $MYSQL_PASS --mysql_database $DATABASE --tables_regex . --threads 4 | grep "Checksum for table" | awk '{print $11" "$13" "$15}' | sort > mysql

python db_compare/clickhouse_table_checksum.py --clickhouse_host $CH_HOST --clickhouse_user $CH_USER --clickhouse_password $PASS  --clickhouse_database data_types --tables_regex . --threads 4  --sign_column "" | grep "Checksum for table" | awk '{print $11" "$13" "$15}' | sort >ch

diff ch mysql

The binary types have been addressed. Binary types are encoded as base64 by clickhouse_loader.py, while the checksum encoding is currently hex only (see for the sink-connector FR #340)

if 'datetime' == data_type or 'datetime(1)'== data_type or 'datetime(2)' == data_type or 'datetime(3)' == data_type:
# CH datetime range is not the same as MySQL https://clickhouse.com/docs/en/sql-reference/data-types/datetime64/
select += f"case when {column_name} > substr('2283-11-11 23:59:59.999', 1, length({column_name})) then TRIM(TRAILING '0' FROM CAST('2283-11-11 23:59:59.999' AS datetime(3))) else case when {column_name} <= '1925-01-01 00:00:00' then TRIM(TRAILING '.' FROM TRIM(TRAILING '0' FROM CAST('1925-01-01 00:00:00.000' AS datetime(3)))) else TRIM(TRAILING '.' FROM TRIM(TRAILING '0' FROM {column_name})) end end"
select += f"case when {column_name} > substr('2299-12-31 23:59:59.999', 1, length({column_name})) then substr(TRIM(TRAILING '0' FROM CAST('2299-12-31 23:59:59.999' AS datetime(3))),1,length({column_name})) else case when {column_name} <= '1900-01-01 00:00:00' then TRIM(TRAILING '.' FROM TRIM(TRAILING '0' FROM CAST('1900-01-01 00:00:00.000' AS datetime(3)))) else TRIM(TRAILING '.' FROM TRIM(TRAILING '0' FROM {column_name})) end end"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be nice if these values 2299-12-31 are in a separate file or a separate variable , so that it can be updated easily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, this was tested with 23.3. @subkanthi do you know if there is a JDBC driver that supports those maximum values ?
will add it as a parameter as it may change again

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@subkanthi Added --min_date_value and --max_date_value, please review !

@aadant aadant changed the title Fixing temporal data types clickhouse_loader.py : fixing temporal and binary data types Oct 28, 2023
@aadant aadant requested a review from subkanthi October 28, 2023 14:13
@aadant aadant merged commit e203792 into Altinity:develop Nov 3, 2023
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants