-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small optimizations on iggen buffer handling #317
Conversation
e59eacf
to
9785173
Compare
0bd705c
to
a46a214
Compare
Check-perf-impact results: (c8fb992b35322012b54e351345fdf71a) ✔️ No significant performance change in the microbenchmark set. You are good to go! Relative execution time per category: (mean of relative medians)
|
Pull Request Test Coverage Report for Build 12787152826Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done.
LGTM! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I've suggested two comment changes that I've added for my understanding while investigating how to implement replicated writes!
a46a214
to
c088d0b
Compare
This also avoids an unordered_map by transposing the perform_task_buffer_accesses loop.
1ee0a18
to
3be3378
Compare
perform_task_buffer_accesses
updates last-writers twice to gracefully handle overlapping writes, which is an edge case. This PR quickly checks if overlapping writes are present, and sticks to a single update if there are not. By transposing the loop nest fromchunk -> bid
tobid -> chunk
, we can also save avoid constructing anotherunordered_map
.Results are not looking too impressive in the benchmark report, but I do get a consistent 4% speedup for RSim
room_small
, which is scheduler bound on gpuc3.