-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc: Optimize the computation of sliding window kRange frame bound #11285
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
33c2ef2
to
2ae10fb
Compare
velox/exec/WindowPartition.cpp
Outdated
@@ -398,10 +399,12 @@ void WindowPartition::updateKRangeFrameBounds( | |||
// [0, currentRow] are examined. For following bounds, rows between | |||
// [currentRow, numRows()) are checked. | |||
if (isPreceding) { | |||
start = 0; | |||
start = lastFoundIndex == -1 ? 0 : lastFoundIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PHILO-HE : My hunch is that you would need more than this change to use lastFoundIndex. You could check the value of the frameColumn at this row with that of lastFoundIndex to use this optimization. If the frame bound value does not follow the sliding pattern that this optimization is targeted for then its not applicable.
0bff19e
to
dc52ae7
Compare
e08c19b
to
69a1793
Compare
Hi @aditi-pandit, could you spare some time to review this pr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PHILO-HE. Just from a quick read of your code.
velox/exec/WindowPartition.cpp
Outdated
continue; | ||
} | ||
|
||
auto compareWithLastRow = [&](vector_size_t currentRow) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compareWithLastRow should return an integer value
velox/exec/WindowPartition.cpp
Outdated
// is applicable, we also use the last row's search result to determine | ||
// the start search bound for current row. | ||
|
||
if (compareWithLastRow(currentRow) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a local variable for compareWithLastRow(currentRow) as it is invoked multiple times in the code-path. Would be avoid to avoid data_->compare each time.
velox/exec/WindowPartition.cpp
Outdated
} else { | ||
start = rawFrameBounds[i - 1] == -1 | ||
? 0 | ||
: std::min(rawFrameBounds[i - 1], i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate the reason for the use of std::min here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit, seems not necessary. Just removed. Thanks!
@aditi-pandit, just fixed your comment. Could you please review again? |
Friendly ping @aditi-pandit. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PHILO-HE : Another quick round of comments. Had bunch of questions around tests and the benchmark.
|
||
add_executable(velox_window_krange_frame_benchmark | ||
WindowKRangeFrameBenchmark.cpp) | ||
target_link_libraries( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : Add newline in between.
continue; | ||
} | ||
|
||
if (isPreceding) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a new boolean variable bool sliding = compareResult > 0, and use sliding in the checks.
@@ -108,3 +108,16 @@ target_link_libraries( | |||
velox_vector_test_lib | |||
velox_window | |||
Folly::follybenchmark) | |||
|
|||
add_executable(velox_window_krange_frame_benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence if this benchmark should be submit in the code-base. The results you have are sufficient.
Much of this setup looks like that from https://github.com/facebookincubator/velox/blob/main/velox/exec/benchmarks/WindowPrefixSortBenchmark.cpp. Is there a means to consolidate the code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit, we may have more optimizations on kRange frame in the future. So I prefer keeping this benchmark test at least in short term.
This test is modified based on WindowPrefixSortBenchmark.cpp
. There are some differences in data generating, test flags, etc., which looks not easy to consolidate the code.
// is applicable, we can use the last row's search result to determine | ||
// the start search index for current row. | ||
|
||
auto compareResult = compareWithLastRow(currentRow); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is tested by the cases here:
void WindowTestBase::testKRangeFrames(const std::string& function) { |
And these are all sliding window range frames.
Could we add more tests for arbitrary frames now as your code needs to catch those ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit, in real world SQL, kRange frame always follows the sliding pattern, considering constant offset is applied on ordered column data row by row. We added the check for sliding pattern just for making fuzzer test pass (arbitrary frame data is generated in fuzzer test). So I think it doesn't need to add some dedicated test for arbitrary case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PHILO-HE : Agree, that real world SQL kRange usually follows sliding pattern. But at the same time we are allowing random column based values as well. Agree that the fuzzer would generate such frames. But I feel its worth adding a unit test, even if its a simple one for the sake of completeness. It would be good for demonstration of the use of it and that your code is robust in such a case.
68f14c1
to
8f3a1b3
Compare
8f3a1b3
to
05c4b1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PHILO-HE : Thanks for this improvement. Have reviewed the benchmark as well.
This PR is looking overall good minus the test.
auto compareResult = compareWithLastRow(currentRow); | ||
if (compareResult == 0) { | ||
// As same as last row's search result. | ||
rawFrameBounds[i] = rawFrameBounds[i - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write comments using full sentences. In this case it could be "Use the same frame bound as the previous row, as the value of the current row is same as the previous row."
|
||
bool slidingSearch = compareResult < 0; | ||
if (isPreceding) { | ||
// It's the first row or it doesn't match the sliding search pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not needed.
} | ||
end = currentRow + 1; | ||
} else { | ||
// It's the first row or it doesn't match the sliding search pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not needed.
BENCHMARK_DRAW_LINE(); | ||
|
||
// Count aggregate. | ||
AGG_BENCHMARKS(count, k_array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using different aggregates for the different distributions could be more interesting.
|
||
VectorFuzzer::Options opts; | ||
opts.vectorSize = kRowsPerVector; | ||
opts.nullRatio = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add some nulls in between as well, since the range frames code path has explicit null checks as well.
children.emplace_back(makeFlatVector<int32_t>( | ||
kRowsPerVector, [](auto row) { return row; })); | ||
|
||
const int kPrecedingOffset = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we choose random numbers here instead ?
task->noMoreSplits(tableScanPlanId); | ||
suspender1.dismiss(); | ||
|
||
while (auto result = task->next()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should count the timing in the Window operator for a more fine-tuned measurement of the range search improvement.
Getting the entire task run time includes many other components like TableScan read which are expensive as well.
For preceding frame, considering we are doing the search on ascending ordered column, the current row's search range can be
[lastSearchResult, currentRow]
, not necessary to be[0, currentRow]
.See below example for start bound search (1 preceding). From last search, we know the start bound for 6 is 5 (rowID = 2). Then, when searching start bound for 7, as
current_row >= last_row
is always true for ascending ordered column, we knowcurrent_row - 1 > = last_row - 1
. Thus, we can start the search from rowID = 2 (last search result) instead of 0.For following frame, the search can be optimized similarly. In summary, this optimized search range looks like a sliding window. This optimization is especially useful for large window partition case.
This pr also includes a benchmark test. The below result shows that it can bring ~45% perf. gain in large partition case (the 1st case). In the 2nd case, the partition size is smaller, for which it brings ~37% perf. gain. And in the 3rd case, partition col data is produced randomly, which means this extreme case has very small partition size (even each partition has one single row). So there is no perf gain as expected.
Without this optimization:
With this optimization: