You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to query data from Clickhouse using Spark jdbc connector.
I'm using some filters on timestamps. As a result I'm getting exception. Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH)
Steps to reproduce
Create clickhouse tables
Run following Spark code
Expected behaviour
Query run successfully
Code example
frompyspark.sqlimportSparkSession# Set up the SparkSession to include ClickHouse as a custom catalogspark=SparkSession.builder \
.appName("ClickHouse Catalog Example") \
.config("spark.jars.packages", "com.clickhouse:clickhouse-jdbc:0.6.5,org.apache.httpcomponents.client5:httpclient5:5.3.1") \
.config("spark.sql.catalog.clickhouse", "com.clickhouse.spark.ClickHouseCatalog") \
.config("spark.sql.catalog.clickhouse.host", "clickhouse-kim.clickhouse.svc") \
.config("spark.sql.catalog.clickhouse.http_port", "8123") \
.config("spark.sql.catalog.clickhouse.database", "telemetry") \
.config("spark.sql.catalog.clickhouse.driver", "com.clickhouse.jdbc.ClickHouseDriver") \
.config("spark.sql.catalog.clickhouse.user", "admin") \
.config("spark.sql.catalog.clickhouse.password", "admin") \
.getOrCreate()
jdbc_df=spark.read.jdbc("jdbc:clickhouse://clickhouse-kim.clickhouse.svc:8123/telemetry", "telemetry.reference_peaks", properties={"user": "admin", "password": "admin", "driver": "com.clickhouse.jdbc.ClickHouseDriver"})
jdbc_df.createOrReplaceTempView("reference_peaks")
# Parameters ts_start and ts_end (example values)ts_start='2024-09-10 23:00:00'ts_end='2024-09-10 23:04:59'# Spark SQL queryquery=f""" SELECT * FROM reference_peaks WHERE tts > to_timestamp('{ts_start}') - INTERVAL 100 SECOND AND tts < to_timestamp('{ts_end}') + INTERVAL 100 SECOND AND ts >= to_timestamp('{ts_start}') AND ts < to_timestamp('{ts_end}')"""# Execute the query using Spark SQLresult_df=spark.sql(query)
# Show the dataframe (for verification)result_df.count()
Error log
24[/10/04](http://localhost:8888/10/04) 10:48:51 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.sql.BatchUpdateException: Code: 53. DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH) (version 24.9.1.3278 (official build))
at com.clickhouse.jdbc.SqlExceptionUtils.batchUpdateError(SqlExceptionUtils.java:107)
at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeAny(SqlBasedPreparedStatement.java:223)
at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeQuery(SqlBasedPreparedStatement.java:286)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:275)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.base[/java.util.concurrent.ThreadPoolExecutor.runWorker](http://localhost:8888/java.util.concurrent.ThreadPoolExecutor.runWorker)(ThreadPoolExecutor.java:1136)
at java.base[/java.util.concurrent.ThreadPoolExecutor](http://localhost:8888/java.util.concurrent.ThreadPoolExecutor)$Worker.run(ThreadPoolExecutor.java:635)
at java.base[/java.lang.Thread.run](http://localhost:8888/java.lang.Thread.run)(Thread.java:840)
24[/10/04](http://localhost:8888/10/04) 10:48:51 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.135 executor driver): java.sql.BatchUpdateException: Code: 53. DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH) (version 24.9.1.3278 (official build))
Configuration
Environment
Client version: 0.6.5
OS: MacOS
ClickHouse server
ClickHouse Server version: 24.9.1.3278
CREATE TABLE statements for tables involved:
CREATE TABLE telemetry.reference_peaks_shard ON CLUSTER '{cluster}'
(
hash_1 String,
hash_2 String,
ts DateTime,
offset Int32,
channel Int32,
station_id String,
tts DateTime,
batchid Int32,
org_ts DateTime
)
ENGINE = MergeTree
ORDER BY ts;
CREATE TABLE telemetry.reference_peaks ON CLUSTER '{cluster}' AS telemetry.reference_peaks_shard
ENGINE = Distributed('{cluster}', 'telemetry', 'reference_peaks_shard', channel);
The text was updated successfully, but these errors were encountered:
Describe the bug
I'm trying to query data from Clickhouse using Spark jdbc connector.
I'm using some filters on timestamps. As a result I'm getting exception.
Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH)
Steps to reproduce
Expected behaviour
Code example
Error log
Configuration
Environment
ClickHouse server
CREATE TABLE
statements for tables involved:The text was updated successfully, but these errors were encountered: