Spark JDBC: DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime #1850

maver1ck · 2024-10-04T08:51:38Z

Describe the bug

I'm trying to query data from Clickhouse using Spark jdbc connector.
I'm using some filters on timestamps. As a result I'm getting exception.
Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH)

Steps to reproduce

Create clickhouse tables
Run following Spark code

Expected behaviour

Query run successfully

Code example

from pyspark.sql import SparkSession

# Set up the SparkSession to include ClickHouse as a custom catalog
spark = SparkSession.builder \
    .appName("ClickHouse Catalog Example") \
    .config("spark.jars.packages", "com.clickhouse:clickhouse-jdbc:0.6.5,org.apache.httpcomponents.client5:httpclient5:5.3.1") \
    .config("spark.sql.catalog.clickhouse", "com.clickhouse.spark.ClickHouseCatalog") \
    .config("spark.sql.catalog.clickhouse.host", "clickhouse-kim.clickhouse.svc") \
    .config("spark.sql.catalog.clickhouse.http_port", "8123") \
    .config("spark.sql.catalog.clickhouse.database", "telemetry") \
    .config("spark.sql.catalog.clickhouse.driver", "com.clickhouse.jdbc.ClickHouseDriver") \
    .config("spark.sql.catalog.clickhouse.user", "admin") \
    .config("spark.sql.catalog.clickhouse.password", "admin") \
    .getOrCreate()

jdbc_df = spark.read.jdbc("jdbc:clickhouse://clickhouse-kim.clickhouse.svc:8123/telemetry", "telemetry.reference_peaks", properties={"user": "admin", "password": "admin", "driver": "com.clickhouse.jdbc.ClickHouseDriver"})
jdbc_df.createOrReplaceTempView("reference_peaks")

# Parameters ts_start and ts_end (example values)
ts_start = '2024-09-10 23:00:00'
ts_end = '2024-09-10 23:04:59'

# Spark SQL query
query = f"""
    SELECT *
    FROM reference_peaks
    WHERE tts > to_timestamp('{ts_start}') - INTERVAL 100 SECOND
    AND tts < to_timestamp('{ts_end}') + INTERVAL 100 SECOND
    AND ts >= to_timestamp('{ts_start}')
    AND ts < to_timestamp('{ts_end}')
"""

# Execute the query using Spark SQL
result_df = spark.sql(query)

# Show the dataframe (for verification)
result_df.count()

Error log


24[/10/04](http://localhost:8888/10/04) 10:48:51 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.sql.BatchUpdateException: Code: 53. DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH) (version 24.9.1.3278 (official build))

	at com.clickhouse.jdbc.SqlExceptionUtils.batchUpdateError(SqlExceptionUtils.java:107)
	at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeAny(SqlBasedPreparedStatement.java:223)
	at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeQuery(SqlBasedPreparedStatement.java:286)
	at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:275)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base[/java.util.concurrent.ThreadPoolExecutor.runWorker](http://localhost:8888/java.util.concurrent.ThreadPoolExecutor.runWorker)(ThreadPoolExecutor.java:1136)
	at java.base[/java.util.concurrent.ThreadPoolExecutor](http://localhost:8888/java.util.concurrent.ThreadPoolExecutor)$Worker.run(ThreadPoolExecutor.java:635)
	at java.base[/java.lang.Thread.run](http://localhost:8888/java.lang.Thread.run)(Thread.java:840)
24[/10/04](http://localhost:8888/10/04) 10:48:51 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.135 executor driver): java.sql.BatchUpdateException: Code: 53. DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH) (version 24.9.1.3278 (official build))

Configuration

Environment

Client version: 0.6.5
OS: MacOS

ClickHouse server

ClickHouse Server version: 24.9.1.3278
CREATE TABLE statements for tables involved:

CREATE TABLE telemetry.reference_peaks_shard ON CLUSTER '{cluster}'
(
    hash_1 String,
    hash_2 String,
    ts DateTime,
    offset Int32,
    channel Int32,
    station_id String,
    tts DateTime,
    batchid Int32,
    org_ts DateTime
)
ENGINE = MergeTree
ORDER BY ts;

CREATE TABLE telemetry.reference_peaks ON CLUSTER '{cluster}' AS telemetry.reference_peaks_shard
ENGINE = Distributed('{cluster}', 'telemetry', 'reference_peaks_shard', channel);

The text was updated successfully, but these errors were encountered:

maver1ck added the bug label Oct 4, 2024

chernser added this to the Priority Backlog milestone Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark JDBC: DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime #1850

Spark JDBC: DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime #1850

maver1ck commented Oct 4, 2024

Spark JDBC: DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime #1850

Spark JDBC: DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime #1850

Comments

maver1ck commented Oct 4, 2024

Describe the bug

Steps to reproduce

Expected behaviour

Code example

Error log

Configuration

Environment

ClickHouse server