Is there a way we can run a DML Query (INSERT/MERGE) via BigQuery spark connector? #575

spadhi7 · 2022-03-27T04:50:54Z

No description provided.

davidrabinowitz · 2022-03-27T15:40:21Z

INSERT can be run by creating a DataFrame and saving it to BigQuery.

MERGE is not supported at the moment - I'd appreciate to see a use case.

spadhi7 · 2022-03-27T17:11:21Z

We are trying to do CDC from traditional RDMS to BigQuery via Kafka. Source -> Kafka -> Spark Structured Streaming -> BigQuery

joydeepml · 2022-09-13T13:19:57Z

@davidrabinowitz Merge support in the spark connector will be greatly appreciated. Will help avoid running a separate job using SQL merge https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement

the use case as mentioned by @spadhi7 is to apply the changes in a source table to a table in bigquery by utilising the change feed which provides inserts, updates and deletes in the source table.

afaik, merge is supported in spark only for the delta format
https://docs.delta.io/latest/delta-update.html#language-python

Will be great if bigquery connector would support that too

davidrabinowitz · 2022-09-13T15:42:54Z

Thanks for the suggestion - I agree this is a great idea, however at the moment we try to use only Spark APIs without any proprietary APIs from our end. We will review this and see what it the best way to implement the merge functionality.

nicodds · 2023-03-29T14:56:38Z

It would be a great feature!

In my current situation, I need to update a table with a daily delta that may originate either from new inserts or from updates to existing records. In order to keep the bigquery table updated, I have to upload the changes to a staging table and then launch a separate merge query. It would be optimal if that could be done directly in the spark connector.

I agree, anyway, that this feature is hard to implement, since it would add complexity on the load job.

h5chauhan · 2023-04-05T21:38:05Z

This would be similar to Delta.io. It would be nice if the connector could support it.

khaledh · 2023-04-06T20:17:10Z

As @nicodds mentioned, we're also using the same approach: write the new changes to a temp table using the connector, then run a BQ SQL query to do the merge, and finally we drop the temp table. It would be nice to do this directly instead.

I wonder if the connector can implement this feature in a way similar to how Iceberg does it: https://iceberg.apache.org/docs/latest/spark-writes/#merge-into

ajaybiswal · 2024-07-23T11:05:00Z

Hi just wanted to know if merge feature is available now as I couldn't find anything in the docs.

davidrabinowitz self-assigned this Mar 27, 2022

khaledh mentioned this issue May 5, 2023

BadRequest when performing merge after write in the same Spark session #964

Closed

davidrabinowitz removed their assignment Nov 1, 2023

isha97 added the enhancement New feature or request label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way we can run a DML Query (INSERT/MERGE) via BigQuery spark connector? #575

Is there a way we can run a DML Query (INSERT/MERGE) via BigQuery spark connector? #575

spadhi7 commented Mar 27, 2022

davidrabinowitz commented Mar 27, 2022

spadhi7 commented Mar 27, 2022

joydeepml commented Sep 13, 2022 •

edited

Loading

davidrabinowitz commented Sep 13, 2022

nicodds commented Mar 29, 2023

h5chauhan commented Apr 5, 2023 •

edited

Loading

khaledh commented Apr 6, 2023

ajaybiswal commented Jul 23, 2024

Is there a way we can run a DML Query (INSERT/MERGE) via BigQuery spark connector? #575

Is there a way we can run a DML Query (INSERT/MERGE) via BigQuery spark connector? #575

Comments

spadhi7 commented Mar 27, 2022

davidrabinowitz commented Mar 27, 2022

spadhi7 commented Mar 27, 2022

joydeepml commented Sep 13, 2022 • edited Loading

davidrabinowitz commented Sep 13, 2022

nicodds commented Mar 29, 2023

h5chauhan commented Apr 5, 2023 • edited Loading

khaledh commented Apr 6, 2023

ajaybiswal commented Jul 23, 2024

joydeepml commented Sep 13, 2022 •

edited

Loading

h5chauhan commented Apr 5, 2023 •

edited

Loading