Lack of array type support in binary mode #366

lika310 · 2024-11-06T12:01:55Z

Is your feature request related to a problem? Please describe.
In binary mode with "spark.clickhouse.read.format" set to "binary", the following exception is thrown when selecting a clickhouse column of Array(String) type.

Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String
	at org.apache.spark.sql.catalyst.util.GenericArrayData.getUTF8String(GenericArrayData.scala:73)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Describe the solution you'd like
I need strings are reading successfully in binary mode.

Describe alternatives you've considered
I've considered using json mode but I had an issue reading FixedString type columns. So neither of modes didn't let me to read all columns.

Additional context

The text was updated successfully, but these errors were encountered:

lika310 added the enhancement New feature or request label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lack of array type support in binary mode #366

Lack of array type support in binary mode #366

lika310 commented Nov 6, 2024

Lack of array type support in binary mode #366

Lack of array type support in binary mode #366

Comments

lika310 commented Nov 6, 2024