-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] Is there an option on connector to manage/configure inserts to multiple partitions? #368
Comments
the error code 1002 means |
I'm not sure if you have awared this article, which introduces the generic idea of writing optimization. the key idea is
I noticed you have disabled |
Thanks for fast reply. All i have is error trace from failed Spark tasks logs
|
Yes, sure, i disabled
In all of these cases i have faced approximately same results on writing stage Maybe i missed something but i didn't face any impact of |
Well, after additional investigation I have achieved necessary performance. |
Hi,
In our process, we are using connector to transfer data from HDFS to Clickhouse via Spark. And this process overwrites a large number of partitions (there is not much data in each of them).
The target table in Clickhouse is a Distributed table, under which there are ReplicatedMergeTree tables on 6 nodes.
Lately, we are increasingly encountering a problem where tasks in Spark crash with an error
We are using the following connector configuration in SparkConf
Spark version - 3.3.0
Connector version - 0.7.3 (jdbc - 0.4.6)
One method we tried to fix it was to play with the batch_size per insert. But it is still not always possible to achieve stability and reliability for jobs.
We didn't find any descriptions of options and connector configurations for cases like this.
There is only a config at the source code which we were used for configuration - https://github.com/ClickHouse/spark-clickhouse-connector/blob/v0.7.3/spark-3.3/clickhouse-spark/src/main/scala/org/apache/spark/sql/clickhouse/ClickHouseSQLConf.scala
Could you please guid if there are any connector options/configs through which we can manage the above scenarios? Or jdbc driver options which we could configure to achieve stability per inserts?
The text was updated successfully, but these errors were encountered: