-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery Pushdown filtering on Spark 3.4.2 #1207
Comments
@sid-habu Currently, pushdown is only supported till spark 3.3. |
@isha97 In that case, can you please confirm if I use a workaround by passing in a raw query, it will execute the filtering in BigQuery
|
@isha97 is is true filters aren't supported in Spark 3.4+? I realise However, when I look in the BigQuery console and at the project history, I see a This is when using |
@tom-s-powell Filter pushdown are enabled by default in all the connector flavors, and cannot be disabled. You can track the usage of filters in the application log - both the filters the connector gets from Spark and the compiled filter are logged under the Regarding |
I see thank you. So in the case of One use-case we have for using EDIT: One other question would be around limits, and if that is a pushdown capability? |
When loading from query ( In the If the question is whether we can push down df.head(20) as the equivalent of |
Thanks for the explanation. And there's no way of using the BigQuery Storage Read API to query time travel without creating a temporary table? I'm assuming there's cost associated with that. The other case we have is for partitioned tables. We have had reports that partitions are not pruned, but I assumed that is because we are using |
@davidrabinowitz @isha97 Is there a timeline when this feature will be released for spark 3.5? We actually have a strong use-case for this requirement and without this, we're incurring huge unnecessary costs. I've already seen a couple pending / closed issues related to this feature request. |
An additional question related to the previous comment. Is the pushdown only supported on |
@davidrabinowitz Following up on the previous questions. |
Hi @davidrabinowitz, Are there any plans to add support for Spark 3.4 and 3.5? If so, could you please share the timeline or any progress updates on this? Thanks! |
I have a Big Query table
foo
with aDATE
columnbar_date
. I am trying to query this table in Spark3.4.2
using thespark-bigquery-with-dependencies:0.30.0
connectorI am unable to get the pushdown filtering to work as the physical plan shows
PushedFilters: []
and pulls in all the data from BQ before doing the filtering in SparkBelow is my code. I even tried enabling
BigQueryConnectorUtils.enablePushdownSession(spark)
but found that it isn't supported yet for Spark 3.4+Physical plan after stripping the table name and requireColumns. The filter list is empty in the plan as-is
I am sure I am missing something trivial as I expect this simple filtering to be pushed down to BigQuery.
The text was updated successfully, but these errors were encountered: