-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way we can run a DML Query (INSERT/MERGE) via BigQuery spark connector? #575
Comments
INSERT can be run by creating a DataFrame and saving it to BigQuery. MERGE is not supported at the moment - I'd appreciate to see a use case. |
We are trying to do CDC from traditional RDMS to BigQuery via Kafka. Source -> Kafka -> Spark Structured Streaming -> BigQuery |
@davidrabinowitz Merge support in the spark connector will be greatly appreciated. Will help avoid running a separate job using SQL merge https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement the use case as mentioned by @spadhi7 is to apply the changes in a source table to a table in bigquery by utilising the change feed which provides inserts, updates and deletes in the source table. afaik, merge is supported in spark only for the delta format Will be great if bigquery connector would support that too |
Thanks for the suggestion - I agree this is a great idea, however at the moment we try to use only Spark APIs without any proprietary APIs from our end. We will review this and see what it the best way to implement the merge functionality. |
It would be a great feature! In my current situation, I need to update a table with a daily delta that may originate either from new inserts or from updates to existing records. In order to keep the bigquery table updated, I have to upload the changes to a staging table and then launch a separate merge query. It would be optimal if that could be done directly in the spark connector. I agree, anyway, that this feature is hard to implement, since it would add complexity on the load job. |
This would be similar to Delta.io. It would be nice if the connector could support it. |
As @nicodds mentioned, we're also using the same approach: write the new changes to a temp table using the connector, then run a BQ SQL query to do the merge, and finally we drop the temp table. It would be nice to do this directly instead. I wonder if the connector can implement this feature in a way similar to how Iceberg does it: https://iceberg.apache.org/docs/latest/spark-writes/#merge-into |
Hi just wanted to know if merge feature is available now as I couldn't find anything in the docs. |
No description provided.
The text was updated successfully, but these errors were encountered: