-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wave Metrics #10679
Wave Metrics #10679
Conversation
oerling
commented
Aug 7, 2024
- Make multithreaded memcpy for staging transfers for GPU table scan.
- Make variants of bit unpacking in GpuDecoder-inl.cuh. Make selective decoding templatized as opposed to runtime switching.
- Add pieces to GpuDecoderTest, like comparing calling via launchDecode or (multi-function blocks) or decodeGlobal (single function thread blocks).
- Add a metric for driver thread waiting for first continuable stream.
- Check approx correctness of Wave runtimeStats.
- Refactor QueryBenchmarkBase.* from TpchBenchmark. Logic to do sweeps across parameter combinations.
- Add persistent file format to Wave mock format.
- Add benchmark for scan, filter, filter expr, projection. aggregation combinations with Wave and Dwrf.
✅ Deploy Preview for meta-velox canceled.
|
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This pull request was exported from Phabricator. Differential Revision: D60880466 |
Summary: - Make multithreaded memcpy for staging transfers for GPU table scan. - Make variants of bit unpacking in GpuDecoder-inl.cuh. Make selective decoding templatized as opposed to runtime switching. - Add pieces to GpuDecoderTest, like comparing calling via launchDecode or (multi-function blocks) or decodeGlobal (single function thread blocks). - Add a metric for driver thread waiting for first continuable stream. - Check approx correctness of Wave runtimeStats. - Refactor QueryBenchmarkBase.* from TpchBenchmark. Logic to do sweeps across parameter combinations. - Add persistent file format to Wave mock format. - Add benchmark for scan, filter, filter expr, projection. aggregation combinations with Wave and Dwrf. Pull Request resolved: facebookincubator#10679 Differential Revision: D60880466
1a2f4a5
to
4a20f55
Compare
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This pull request was exported from Phabricator. Differential Revision: D60880466 |
Summary: - Make multithreaded memcpy for staging transfers for GPU table scan. - Make variants of bit unpacking in GpuDecoder-inl.cuh. Make selective decoding templatized as opposed to runtime switching. - Add pieces to GpuDecoderTest, like comparing calling via launchDecode or (multi-function blocks) or decodeGlobal (single function thread blocks). - Add a metric for driver thread waiting for first continuable stream. - Check approx correctness of Wave runtimeStats. - Refactor QueryBenchmarkBase.* from TpchBenchmark. Logic to do sweeps across parameter combinations. - Add persistent file format to Wave mock format. - Add benchmark for scan, filter, filter expr, projection. aggregation combinations with Wave and Dwrf. Pull Request resolved: facebookincubator#10679 Differential Revision: D60880466 Pulled By: oerling
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: - Make multithreaded memcpy for staging transfers for GPU table scan. - Make variants of bit unpacking in GpuDecoder-inl.cuh. Make selective decoding templatized as opposed to runtime switching. - Add pieces to GpuDecoderTest, like comparing calling via launchDecode or (multi-function blocks) or decodeGlobal (single function thread blocks). - Add a metric for driver thread waiting for first continuable stream. - Check approx correctness of Wave runtimeStats. - Refactor QueryBenchmarkBase.* from TpchBenchmark. Logic to do sweeps across parameter combinations. - Add persistent file format to Wave mock format. - Add benchmark for scan, filter, filter expr, projection. aggregation combinations with Wave and Dwrf. Pull Request resolved: facebookincubator#10679 Differential Revision: D60880466 Pulled By: oerling
This pull request was exported from Phabricator. Differential Revision: D60880466 |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
GTest::gtest | ||
GTest::gtest_main | ||
GTest::gmock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the properly name spaced targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed it in #10732