Improve performance of db-benchmark query 8 #13586

Dandandan · 2024-11-27T22:23:23Z

Is your feature request related to a problem or challenge?

Query 8 in db-benchmark is slower than other queries
https://github.com/MrPowers/mrpowers-benchmarks

The query is as follows

select id6, largest2_v3 from
  (select id6, v3 as largest2_v3, row_number() over (partition by id6 order by v3 desc) as order_v3
  from x
  where v3 is not null) sub_query
where order_v3 <= 2

Describe the solution you'd like

Profile / analyze this query
Improve performance

It might be the part row_number() over (partition by id6 order by v3 desc) or rather the sorting is the most expensive.

`

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

alan910127 · 2024-11-28T19:14:29Z

take

Dandandan · 2024-11-29T10:16:18Z

I didn't profile yet, but one potentially problematic line I found here

datafusion/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs

Line 422 in 6512437

concat_batches(self.input_schema(), [input_buffer, &record_batch])?

:

This concatenates [input_buffer, &record_batch].

Changing the input_buffer state to a Vec<RecordBatch> and delaying concatenating would be better as concatenating in a loop is O(n^2) and has way more overhead.

alan910127 · 2024-11-30T15:31:13Z

Hi @Dandandan, I tried profiling the execution of datafusion-cli with the following command:

CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph -o q8.svg -- -f q8.sql

Note: I have an x directory generated by falsa with
falsa groupby --path-prefix=./x --size MEDIUM --data-format PARQUET

From the flamegraph, I see that the concat_batches function you mentioned only takes < 5% of the total time. Since I'm not very good at this type of performance optimization, I'm unsure whether this is the primary issue. I may need more time to investigate further.

EDIT:
concat_batches is taking ~56% of the time running update_partition_batch

Dandandan · 2024-12-03T11:29:12Z

concat_batches is taking ~56% of the time running update_partition_batch

That is an interesting finding. Any chance you could share the flamegraph?

I think for concat_batches: I think a solution could be to use Vec<RecordBatch> for the state and delay concat_batches

alan910127 · 2024-12-03T12:22:32Z

Any chance you could share the flamegraph?

Certainly! Here is it:

Dandandan · 2024-12-03T12:52:51Z

Hm so most of the time indeed seems spent in sort / merge, so I think that has the highest priority.

alan910127 · 2024-12-03T12:57:31Z

I’ve generated another flamegraph using --inverted --reverse, but it’s too large to include directly in this comment. Do you have any suggestions for sharing it?

alan910127 · 2024-12-03T15:05:34Z

I found out that <arrow_row::Row as core::cmp::Ord>::cmp is consuming a significant amount of time in is_gt within update_loser_tree. It eventually calls SliceOrd::compare, which compares each element in the slices until it finds a non-equal case or, if all elements are equal, compares the lengths. Do we have any possible heuristics to avoid comparing the slices?

EDIT: I used Compiler Explorer to check what SliceOrd::compare::<u8> (which is what Row::cmp calls) does, and saw that it calls memcmp under the hood. Given that the [libc.so.6] portion appears quite large within Row::cmp in the flamegraph, I suspect that using heuristics to avoid comparing slices could be beneficial.

Dandandan · 2024-12-04T09:07:36Z

SortPreservingMergeStream now works as follows since #3386

Converting rows to row-format (which involves copying / converting the datasets to some byte format)
Comparing the sorted partitions/streams against each other (via Compare), choosing the smallest/biggest row

I think for single column sorts (like this query), we should be able to avoid converting to Row and comparing the (primitive) values instead, which should speedup single column sorts as it avoids the conversion step and will allow comparing the primitive values.

SortExec uses single-column sort already.

Dandandan added the enhancement New feature or request label Nov 27, 2024

github-actions bot assigned alan910127 Nov 28, 2024

Dandandan mentioned this issue Dec 4, 2024

Optimize SortPreservingMergeStream for single-column merge #13642

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of db-benchmark query 8 #13586

Improve performance of db-benchmark query 8 #13586

Dandandan commented Nov 27, 2024

alan910127 commented Nov 28, 2024

Dandandan commented Nov 29, 2024 •

edited

Loading

alan910127 commented Nov 30, 2024 •

edited

Loading

Dandandan commented Dec 3, 2024

alan910127 commented Dec 3, 2024

Dandandan commented Dec 3, 2024

alan910127 commented Dec 3, 2024

alan910127 commented Dec 3, 2024 •

edited

Loading

Dandandan commented Dec 4, 2024 •

edited

Loading

Improve performance of db-benchmark query 8 #13586

Improve performance of db-benchmark query 8 #13586

Comments

Dandandan commented Nov 27, 2024

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

alan910127 commented Nov 28, 2024

Dandandan commented Nov 29, 2024 • edited Loading

alan910127 commented Nov 30, 2024 • edited Loading

Dandandan commented Dec 3, 2024

alan910127 commented Dec 3, 2024

Dandandan commented Dec 3, 2024

alan910127 commented Dec 3, 2024

alan910127 commented Dec 3, 2024 • edited Loading

Dandandan commented Dec 4, 2024 • edited Loading

Dandandan commented Nov 29, 2024 •

edited

Loading

alan910127 commented Nov 30, 2024 •

edited

Loading

alan910127 commented Dec 3, 2024 •

edited

Loading

Dandandan commented Dec 4, 2024 •

edited

Loading