Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuse operations in equal_rows_arr #12131

Open
Dandandan opened this issue Aug 23, 2024 · 3 comments · May be fixed by #13607
Open

Fuse operations in equal_rows_arr #12131

Dandandan opened this issue Aug 23, 2024 · 3 comments · May be fixed by #13607
Assignees
Labels
enhancement New feature or request performance Make DataFusion faster

Comments

@Dandandan
Copy link
Contributor

Dandandan commented Aug 23, 2024

Is your feature request related to a problem or challenge?

equal_rows_arr compares pairs of 2 arrays with indices for equality but shows up in profiles.

Currently this is done in the following way

  • take the values for the indices for the first pair
  • comparing the arrays using eq or not_distinct
  • doing the same for the next pairs and anding the results
  • Filtering the indices based on the resulting boolean array

Describe the solution you'd like

We could optimize this in some ways:

  • writing a kernel that doesn't use take (i.e. copy the array) but compares arrays based on the indices.
  • writing results to a single booleanbuffer rather than creating a new one every time
  • removing indices from the list (e.g. using Vec::retain) not matching rather than creating a boolean array for a filter

Describe alternatives you've considered

No response

Additional context

No response

@Rachelint
Copy link
Contributor

take

@Rachelint Rachelint removed their assignment Nov 11, 2024
@Rachelint
Copy link
Contributor

Unassign this due to a bit busy currently.
@LeslieKid Maybe you will be interested on this.

@LeslieKid
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Make DataFusion faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants