Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: Optimize the computation of sliding window kRange frame bound #11285

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PHILO-HE
Copy link
Contributor

@PHILO-HE PHILO-HE commented Oct 17, 2024

For preceding frame, considering we are doing the search on ascending ordered column, the current row's search range can be [lastSearchResult, currentRow], not necessary to be [0, currentRow].

See below example for start bound search (1 preceding). From last search, we know the start bound for 6 is 5 (rowID = 2). Then, when searching start bound for 7, as current_row >= last_row is always true for ascending ordered column, we know current_row - 1 > = last_row - 1. Thus, we can start the search from rowID = 2 (last search result) instead of 0.

rowID     column (ordered)
0                  3
1                  4
2                  5
3                  6
4                  7
...                 ...

For following frame, the search can be optimized similarly. In summary, this optimized search range looks like a sliding window. This optimization is especially useful for large window partition case.

This pr also includes a benchmark test. The below result shows that it can bring ~45% perf. gain in large partition case (the 1st case). In the 2nd case, the partition size is smaller, for which it brings ~37% perf. gain. And in the 3rd case, partition col data is produced randomly, which means this extreme case has very small partition size (even each partition has one single row). So there is no perf gain as expected.

Without this optimization:

============================================================================
[...]hmarks/WindowKRangeFrameBenchmark.cpp     relative  time/iter   iters/s
============================================================================
doRun(count_INTEGER_k_array)                              467.35ms      2.14
----------------------------------------------------------------------------
doRun(count_INTEGER_k_norm)                               410.39ms      2.44
----------------------------------------------------------------------------
doRun(count_INTEGER_k_hash)                               661.90ms      1.51
----------------------------------------------------------------------------

With this optimization:

============================================================================
[...]hmarks/WindowKRangeFrameBenchmark.cpp     relative  time/iter   iters/s
============================================================================
doRun(count_INTEGER_k_array)                              259.84ms      3.85
----------------------------------------------------------------------------
doRun(count_INTEGER_k_norm)                               257.79ms      3.88
----------------------------------------------------------------------------
doRun(count_INTEGER_k_hash)                               663.54ms      1.51
----------------------------------------------------------------------------

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 17, 2024
Copy link

netlify bot commented Oct 17, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 05c4b1e
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6780c85c199d0900080600e3

@PHILO-HE PHILO-HE force-pushed the optimize-window-1 branch 2 times, most recently from 33c2ef2 to 2ae10fb Compare October 23, 2024 13:36
@@ -398,10 +399,12 @@ void WindowPartition::updateKRangeFrameBounds(
// [0, currentRow] are examined. For following bounds, rows between
// [currentRow, numRows()) are checked.
if (isPreceding) {
start = 0;
start = lastFoundIndex == -1 ? 0 : lastFoundIndex;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PHILO-HE : My hunch is that you would need more than this change to use lastFoundIndex. You could check the value of the frameColumn at this row with that of lastFoundIndex to use this optimization. If the frame bound value does not follow the sliding pattern that this optimization is targeted for then its not applicable.

@PHILO-HE PHILO-HE force-pushed the optimize-window-1 branch 2 times, most recently from 0bff19e to dc52ae7 Compare December 9, 2024 07:32
@PHILO-HE PHILO-HE changed the title Dynamically update start index to search value for range frame misc: Dynamically update start index to search value for range frame Dec 9, 2024
@PHILO-HE PHILO-HE force-pushed the optimize-window-1 branch 11 times, most recently from e08c19b to 69a1793 Compare December 11, 2024 10:09
@PHILO-HE PHILO-HE changed the title misc: Dynamically update start index to search value for range frame misc: Optimize the computation of frame bound in a sliding way for window kRange frame Dec 11, 2024
@PHILO-HE PHILO-HE changed the title misc: Optimize the computation of frame bound in a sliding way for window kRange frame misc: Optimize the computation of bound in a sliding way for window kRange frame Dec 11, 2024
@PHILO-HE PHILO-HE marked this pull request as ready for review December 11, 2024 12:58
@PHILO-HE
Copy link
Contributor Author

Hi @aditi-pandit, could you spare some time to review this pr?

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PHILO-HE. Just from a quick read of your code.

continue;
}

auto compareWithLastRow = [&](vector_size_t currentRow) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compareWithLastRow should return an integer value

// is applicable, we also use the last row's search result to determine
// the start search bound for current row.

if (compareWithLastRow(currentRow) == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a local variable for compareWithLastRow(currentRow) as it is invoked multiple times in the code-path. Would be avoid to avoid data_->compare each time.

} else {
start = rawFrameBounds[i - 1] == -1
? 0
: std::min(rawFrameBounds[i - 1], i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate the reason for the use of std::min here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit, seems not necessary. Just removed. Thanks!

@PHILO-HE
Copy link
Contributor Author

@aditi-pandit, just fixed your comment. Could you please review again?

@PHILO-HE
Copy link
Contributor Author

Friendly ping @aditi-pandit. Thanks!

@aditi-pandit aditi-pandit changed the title misc: Optimize the computation of bound in a sliding way for window kRange frame misc: Optimize the computation of sliding window kRange frame bound Dec 20, 2024
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PHILO-HE : Another quick round of comments. Had bunch of questions around tests and the benchmark.


add_executable(velox_window_krange_frame_benchmark
WindowKRangeFrameBenchmark.cpp)
target_link_libraries(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : Add newline in between.

continue;
}

if (isPreceding) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new boolean variable bool sliding = compareResult > 0, and use sliding in the checks.

@@ -108,3 +108,16 @@ target_link_libraries(
velox_vector_test_lib
velox_window
Folly::follybenchmark)

add_executable(velox_window_krange_frame_benchmark
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence if this benchmark should be submit in the code-base. The results you have are sufficient.

Much of this setup looks like that from https://github.com/facebookincubator/velox/blob/main/velox/exec/benchmarks/WindowPrefixSortBenchmark.cpp. Is there a means to consolidate the code ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit, we may have more optimizations on kRange frame in the future. So I prefer keeping this benchmark test at least in short term.
This test is modified based on WindowPrefixSortBenchmark.cpp. There are some differences in data generating, test flags, etc., which looks not easy to consolidate the code.

// is applicable, we can use the last row's search result to determine
// the start search index for current row.

auto compareResult = compareWithLastRow(currentRow);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is tested by the cases here:

void WindowTestBase::testKRangeFrames(const std::string& function) {

And these are all sliding window range frames.

Could we add more tests for arbitrary frames now as your code needs to catch those ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit, in real world SQL, kRange frame always follows the sliding pattern, considering constant offset is applied on ordered column data row by row. We added the check for sliding pattern just for making fuzzer test pass (arbitrary frame data is generated in fuzzer test). So I think it doesn't need to add some dedicated test for arbitrary case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PHILO-HE : Agree, that real world SQL kRange usually follows sliding pattern. But at the same time we are allowing random column based values as well. Agree that the fuzzer would generate such frames. But I feel its worth adding a unit test, even if its a simple one for the sake of completeness. It would be good for demonstration of the use of it and that your code is robust in such a case.

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PHILO-HE : Thanks for this improvement. Have reviewed the benchmark as well.

This PR is looking overall good minus the test.

auto compareResult = compareWithLastRow(currentRow);
if (compareResult == 0) {
// As same as last row's search result.
rawFrameBounds[i] = rawFrameBounds[i - 1];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write comments using full sentences. In this case it could be "Use the same frame bound as the previous row, as the value of the current row is same as the previous row."


bool slidingSearch = compareResult < 0;
if (isPreceding) {
// It's the first row or it doesn't match the sliding search pattern.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not needed.

}
end = currentRow + 1;
} else {
// It's the first row or it doesn't match the sliding search pattern.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not needed.

BENCHMARK_DRAW_LINE();

// Count aggregate.
AGG_BENCHMARKS(count, k_array)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using different aggregates for the different distributions could be more interesting.


VectorFuzzer::Options opts;
opts.vectorSize = kRowsPerVector;
opts.nullRatio = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add some nulls in between as well, since the range frames code path has explicit null checks as well.

children.emplace_back(makeFlatVector<int32_t>(
kRowsPerVector, [](auto row) { return row; }));

const int kPrecedingOffset = 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we choose random numbers here instead ?

task->noMoreSplits(tableScanPlanId);
suspender1.dismiss();

while (auto result = task->next()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should count the timing in the Window operator for a more fine-tuned measurement of the range search improvement.

Getting the entire task run time includes many other components like TableScan read which are expensive as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants