[FEA] (Discussion) Shall we push down the filter to cudf ParquetReader ? #11881

sperlingxx · 2024-12-17T02:01:48Z

Is your feature request related to a problem? Please describe.
This is just an early stage discussion without any specific plan.

Currently, cuDF Parquet Reader supports reading with filter expressions(https://github.com/rapidsai/cudf/blob/branch-25.02/cpp/include/cudf/io/parquet.hpp#L251). Although the filter pushdown does NOT seem to help on diminishing the cost of materialization through row-level read skipping, it might still be helpful as the prerequisite of the potential upcoming feature: pruning the FilterExec and following CoalescingBatchExec if all filters can be pushed down.

sperlingxx · 2024-12-17T02:04:45Z

May I ask for your perspective on this issue? @revans2 @jlowe @winningsix @GaryShen2008

jlowe · 2024-12-17T14:45:35Z

Yes, eventually we want to push the filter predicate into the cudf reader to help avoid materialization. However currently there's no benefit to doing this work, because cudf only uses it to filter rowgroups, and we're already doing that in the Spark plugin. We're also applying predicates in a way that cudf does not do (e.g.: filtering dictionaries to see if the rest of the rowgroup should be skipped).

When cudf starts using the filter predicate to avoid decompress or decode of column pages then yes, we should definitely translate as much of the predicate to cudf as we can.

revans2 · 2024-12-17T16:59:27Z

My perspective is the same as @jlowe. We are in the process of working with CUDF to come up with a design on how to do more lazy decoding of data. In fact right now we are looking at lazy fetching of data as well. It is still very preliminary though.

sperlingxx added ? - Needs Triage Need team to review and classify feature request New feature or request labels Dec 17, 2024

sperlingxx added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Dec 17, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] (Discussion) Shall we push down the filter to cudf ParquetReader ? #11881

[FEA] (Discussion) Shall we push down the filter to cudf ParquetReader ? #11881

sperlingxx commented Dec 17, 2024

sperlingxx commented Dec 17, 2024 •

edited

Loading

jlowe commented Dec 17, 2024

revans2 commented Dec 17, 2024

[FEA] (Discussion) Shall we push down the filter to cudf ParquetReader ? #11881

[FEA] (Discussion) Shall we push down the filter to cudf ParquetReader ? #11881

Comments

sperlingxx commented Dec 17, 2024

sperlingxx commented Dec 17, 2024 • edited Loading

jlowe commented Dec 17, 2024

revans2 commented Dec 17, 2024

sperlingxx commented Dec 17, 2024 •

edited

Loading