Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark JDBC: DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime #1850

Open
maver1ck opened this issue Oct 4, 2024 · 0 comments
Labels

Comments

@maver1ck
Copy link

maver1ck commented Oct 4, 2024

Describe the bug

I'm trying to query data from Clickhouse using Spark jdbc connector.
I'm using some filters on timestamps. As a result I'm getting exception.
Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH)

Steps to reproduce

  1. Create clickhouse tables
  2. Run following Spark code

Expected behaviour

  • Query run successfully

Code example

from pyspark.sql import SparkSession

# Set up the SparkSession to include ClickHouse as a custom catalog
spark = SparkSession.builder \
    .appName("ClickHouse Catalog Example") \
    .config("spark.jars.packages", "com.clickhouse:clickhouse-jdbc:0.6.5,org.apache.httpcomponents.client5:httpclient5:5.3.1") \
    .config("spark.sql.catalog.clickhouse", "com.clickhouse.spark.ClickHouseCatalog") \
    .config("spark.sql.catalog.clickhouse.host", "clickhouse-kim.clickhouse.svc") \
    .config("spark.sql.catalog.clickhouse.http_port", "8123") \
    .config("spark.sql.catalog.clickhouse.database", "telemetry") \
    .config("spark.sql.catalog.clickhouse.driver", "com.clickhouse.jdbc.ClickHouseDriver") \
    .config("spark.sql.catalog.clickhouse.user", "admin") \
    .config("spark.sql.catalog.clickhouse.password", "admin") \
    .getOrCreate()

jdbc_df = spark.read.jdbc("jdbc:clickhouse://clickhouse-kim.clickhouse.svc:8123/telemetry", "telemetry.reference_peaks", properties={"user": "admin", "password": "admin", "driver": "com.clickhouse.jdbc.ClickHouseDriver"})
jdbc_df.createOrReplaceTempView("reference_peaks")

# Parameters ts_start and ts_end (example values)
ts_start = '2024-09-10 23:00:00'
ts_end = '2024-09-10 23:04:59'

# Spark SQL query
query = f"""
    SELECT *
    FROM reference_peaks
    WHERE tts > to_timestamp('{ts_start}') - INTERVAL 100 SECOND
    AND tts < to_timestamp('{ts_end}') + INTERVAL 100 SECOND
    AND ts >= to_timestamp('{ts_start}')
    AND ts < to_timestamp('{ts_end}')
"""

# Execute the query using Spark SQL
result_df = spark.sql(query)

# Show the dataframe (for verification)
result_df.count()

Error log


24[/10/04](http://localhost:8888/10/04) 10:48:51 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.sql.BatchUpdateException: Code: 53. DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH) (version 24.9.1.3278 (official build))

	at com.clickhouse.jdbc.SqlExceptionUtils.batchUpdateError(SqlExceptionUtils.java:107)
	at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeAny(SqlBasedPreparedStatement.java:223)
	at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeQuery(SqlBasedPreparedStatement.java:286)
	at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:275)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base[/java.util.concurrent.ThreadPoolExecutor.runWorker](http://localhost:8888/java.util.concurrent.ThreadPoolExecutor.runWorker)(ThreadPoolExecutor.java:1136)
	at java.base[/java.util.concurrent.ThreadPoolExecutor](http://localhost:8888/java.util.concurrent.ThreadPoolExecutor)$Worker.run(ThreadPoolExecutor.java:635)
	at java.base[/java.lang.Thread.run](http://localhost:8888/java.lang.Thread.run)(Thread.java:840)
24[/10/04](http://localhost:8888/10/04) 10:48:51 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.135 executor driver): java.sql.BatchUpdateException: Code: 53. DB::Exception: Cannot convert string '2024-09-10 22:58:20.0' to type DateTime. (TYPE_MISMATCH) (version 24.9.1.3278 (official build))

Configuration

Environment

  • Client version: 0.6.5
  • OS: MacOS

ClickHouse server

  • ClickHouse Server version: 24.9.1.3278
  • CREATE TABLE statements for tables involved:
CREATE TABLE telemetry.reference_peaks_shard ON CLUSTER '{cluster}'
(
    hash_1 String,
    hash_2 String,
    ts DateTime,
    offset Int32,
    channel Int32,
    station_id String,
    tts DateTime,
    batchid Int32,
    org_ts DateTime
)
ENGINE = MergeTree
ORDER BY ts;

CREATE TABLE telemetry.reference_peaks ON CLUSTER '{cluster}' AS telemetry.reference_peaks_shard
ENGINE = Distributed('{cluster}', 'telemetry', 'reference_peaks_shard', channel);
@maver1ck maver1ck added the bug label Oct 4, 2024
@chernser chernser added this to the Priority Backlog milestone Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants