-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arch-arm: Do not issue memory requests for uncacheable prefetches #1140
base: develop
Are you sure you want to change the base?
Conversation
Hi @arichardson, there are a couple of tests that are failing. Could you please fix that before this can be reviewed/merged? Thank you. |
Hi @arichardson. Software prefetching has a hacky implementation in gem5 which makes it buggy in some cases. While this PR fixes your run, I am afraid is not enough to fix SW prefetching properly. That would probably require to handle cpu,arch,mem-cache,mem-ruby implementation of SW prefetches at the same time.
So why does your CPU never gets a response? It does so because your data prefetch targets uncacheable data (as you can see from the debug log you posted). Therefore:
So I believe a partial solution which would make things work for you would be related to point 1, which is: discard a SW prefetch if it targets uncacheable data as data won't be stored anyway... |
Currently, a prefetch instruction will result in an infinite sleep where the CPU never wakes back up as it is expecting a dcache response even though this will not be delivered for prefetches. Avoid this problem by rejecting uncacheable prefetches in the MMU translation logic. This allows my test workload from gem5#1139 to continue running beyond strlen(). Tested using Atomic,Timing,Minor and O3 CPU. See gem5#1139 for the test case that was used. Fixes: gem5#1139 Change-Id: Ic44bdb87f4099b11a7f9c6c99768a12fbef5842e
a4578ec
to
1fc825f
Compare
I believe I have a better fix for this now, scoped to the Arm MMU code. The new change allows my new reduced test case to pass for all CPU models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arichardson, this seems to me too much of a hack and really arm specific (whereas this is a general issue). While the software prefetching problem requires extra care, IMHO this PR can address the prefetch+uncacheable issue differently and for every ISA by amending the CPU code.
Which means handling the Uncacheable && Prefetch condition after translation has finished to prevent the prefetch request from being sent to the dcache port.
// Prevent prefetching from I/O devices or uncacheable memory. | ||
// Not rejecting such requests will result in panics/infinite loops since | ||
// the memory and cache subsystem do not expect to receive them. | ||
if (req->isPrefetch() && req->isUncacheable() && fault == NoFault) { | ||
DPRINTF(TLBVerbose, "Rejecting uncacheable prefetch for %s=%#x\n", | ||
state.isStage2 ? "IPA" : "VA", vaddr_tainted); | ||
return std::make_shared<PrefetchAbort>( | ||
vaddr, ArmFault::PrefetchUncacheable, state.isStage2, tranMethod); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is that this is just architecturally wrong.
Issuing a prefetch request for uncacheable data shouldn't trigger an exception, it should be simply discarded. I am afraid the exception name is strongly misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is existing logic to drop these which is why I used this approach. Happy to change it to something better.
Currently, a prefetch instruction will result in an infinite sleep where
the CPU never wakes back up as it is expecting a dcache response even
though this will not be delivered for prefetches. Avoid this problem by
rejecting uncacheable prefetches in the MMU translation logic.
This allows my test workload from #1139
to continue running beyond strlen(). Tested using Atomic,Timing,Minor
and O3 CPU. See #1139 for the test
case that was used.
Fixes: #1139
Change-Id: Ic44bdb87f4099b11a7f9c6c99768a12fbef5842e