-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Param source timezone (review) #374
Conversation
It would be better if it didn't depend on the hard-coded <server>.<database>.<table> template
…rse a time_string with an offset marker e.g. +03:00, -0130, Z. Also we can handle an exception when a dateTime string has invalid year e.g. in MSSQL "0000-00-01 00:00:00". Zero year value is invalid in java.
This reverts commit e8cef17.
…es without tzinfo.
…lickhouse-sink-connector into param_source_timezone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation works for UTC but does not for other timezone if the maximum range is reached.
the solution is to let CH convert according the target data type.
structure +=" Nullable(String)" |
I wonder if this works for say Chicago to Chicago instead of UTC to UTC.
The source.datetime.timezone should not be defaulted to UTC (if not specified) but the debezium source timezone.
if (occurrence.contains("DateTime64")) { | ||
String[] params = StringUtils.substringBetween(occurrence, "(", ")").split(","); | ||
String precision = params[0].trim(); | ||
result.append(String.format("DateTime64(%s,\\'%s\\')", precision, timeZone)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
escape clashing with another escape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in #366
String dataType = ClickHouseUtils.escape(columnNameToDataTypeMap.get(entry.getKey()), '\'');
@@ -141,6 +147,29 @@ public ResultSet executeQueryWithResultSet(String sql) throws SQLException { | |||
|
|||
} | |||
|
|||
public String addTimeZoneToColumnDefinition(String typeName, String timeZone) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it is a good idea to use a regexp here as the target DateTime64 may already have a TZ !
example DateTime64(6,'UTC') becomes
DateTime64(6,'America/Chicago')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it works if you use a String instead, CH will do the conversion
Exception I got with the maximum boundaries
`db_to` DateTime64(6,\'America/Chicago\'),`_version` UInt64,`is_deleted` UInt8')
118741 2023-11-13 02:53:56.165 [pool-1-thread-9] ERROR com.altinity.clickhouse.sink.connector.db.DbWriter - ******* ERROR inserting Batch *****************
java.lang.IllegalArgumentException: DateTime(9904571999) should be between -1420070400 and 9904550399 inclusive of both values
at com.clickhouse.data.ClickHouseChecker.newException(ClickHouseChecker.java:17)
at com.clickhouse.data.ClickHouseChecker.between(ClickHouseChecker.java:109)
at com.clickhouse.data.format.BinaryDataProcessor$DateTime64SerDe.serialize(BinaryDataProcessor.java:292)
at com.clickhouse.data.ClickHouseDataProcessor.write(ClickHouseDataProcessor.java:534)
at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.addBatch(InputBasedPreparedStatement.java:345)
at com.altinity.clickhouse.sink.connector.db.DbWriter.addToPreparedStatementBatch(DbWriter.java:479)
at com.altinity.clickhouse.sink.connector.executor.ClickHouseBatchRunnable.flushRecordsToClickHouse(ClickHouseBatchRunnable.java:199)
at com.altinity.clickhouse.sink.connector.executor.ClickHouseBatchRunnable.processRecordsByTopic(ClickHouseBatchRunnable.java:169)
at com.altinity.clickhouse.sink.connector.executor.ClickHouseBatchRunnable.run(ClickHouseBatchRunnable.java:101)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite logical. We can take into account source.timezone for checkIfDateExceedsSupportedRange()
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that works. @subkanthi FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it is a good idea to use a regexp here as the target DateTime64 may already have a TZ !
example DateTime64(6,'UTC') becomes DateTime64(6,'America/Chicago')
Could you explain, please? I don't see any problem here (except possible slowness)). I specifically override it within input()
function to make Clickhouse use defined source.timezone
to parse incoming data instead of column metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the wrong place to change the schema. The sink-connector should create the schema with the correct timezone (target timezone).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the sink-connector, there are 2 places we fix the schema
- for the initial snapshot via custom tools like https://github.com/Altinity/clickhouse-sink-connector/pull/375/files
- via debezium, the sink-connector needs to add this
clickhouse.datetime.timezone
parameter. In your case, it would be the same as the source. default should be not specified and defaults todatabase.connectionTimeZone
(debezium)
source.datetime.timezone
should also default todatabase.connectionTimeZone
what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in terms of timing, let us wait for @subkanthi to merge all pending PRs to develop, we will then address this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@subkanthi let us implement this too #349
.define( | ||
ClickHouseSinkConnectorConfigVariables.SOURCE_DATETIME_TIMEZONE.toString(), | ||
Type.STRING, | ||
"UTC", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it is a good idea. leave it null
by default and initialize with source timezone from Debezium.
Add to the default config.yaml (commented)
Thanks @IlyaTsoi for your contribution. |
Looks good to me apart from a clash with #366. I managed to replicate to UTC with this PR.