-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add extra information into exception reporting #373
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing the vstval2reg.edn file otherwise LGTM.
Why do you need the address, and why do you need CAP_IDX? That was always meant to be a crutch that went away in the standard version. |
So David asked me to post the case where I would have wanted the fault address of a CHERI fault to be available:
Ignoring the SR_SUM check as it is irrelevant to the discussion what this does on a (normal page fault is): Check if the code that triggered the fault is executing in kernel mode (!user_mode(regs)) |
We do not need the faulting address in order to disambiguate the various cases in CheriBSD. I am perplexed that there is the belief Linux does. |
If you take a capability fault and PC is within copyin/out (copy_to/from_user) then you return an error from the routines. If you take a capability fault and PC is not within copyin/out then that's a kernel error (whether because a kernel access was junk or because it failed to use copyin/out) and deserves a panic. |
another issue is whether we want to carry CAP_IDX down the pipe, our feeling is that just reporting the address in Xtval2 is enough for the reported case |
I haven't looked at this particular case in any detail, but my understanding is that having the address allows the code to basically use the same path as the MMU fault. Without the address we have to omit the check the code is doing on the MMU fault case in the CHERI case. As to why the check is done, that is a good question which we are looking into, but all the arches do it. There are a few other points on this as well:
Of course, we can always find out what the address is by looking at the instruction and operands (and potentially the boundaries of the instruction as well) , but given the machine already knows all this, why not make the information available? Better a crutch now than legless later. Feel free to shoot me down:-) |
The address is already available, as if we take (e.g.) a page fault we'll report it. It's hard to justify not reporting it in this case, as many people report wanting to know it. |
The reason information is communicated to trap handlers for page faults is so that you can do all the usual CoW, swapping, etc tricks that rely on knowing what page is being accessed. Yes, we can provide the implementation, but we can provide lots of other things too. There is a concern in my head that, if you provide the information, it implies to people that they should be using it to recover from the trap and resume execution, and without the full capability what are you going to do with it beyond say "yes that's an address"? The only thing you can usefully do is change what errors you report, but you still need to report an error, and it's not clear to me that it's useful to differentiate cases, especially since you can't trust anything about the address other than that it was the address the program attempted to use, and you don't have the capability that was used as the authority. |
The viable alternative is to make all the new CSRs cap width and include the metadata with the tag cleared as we don't want to add rep checks. If the rep check would have failed on the access (particularly for a vector indexed load store where the index is very large) is the metadata useful? I suspect that it is useful in almost all cases. Then the information is more complete. Again - the address is already reported in many cases, so is trivial to add. Adding the metadata is also ok.
no-one has asked for anything else AFAIK, and this one has come up repeatedly in software development. It's important for this spec to be usable by developers and the feedback I'm getting is that it has this problem we need to address. |
Ok, that is a different argument, about intent, which I hadn't thought of I confess. Personally, I don't think we can stop people doing dumb things, after all they could do that by decoding the instruction. I think it is useful to have the address (metadata would useful as well I think) for ease of debugging if nothing else. In this case, IIUC it means the changes for CHERI are a bit less as we can common up more code, though I need to look at the code in depth. I don't like the fact that the program has tried to access something and we don't immediately know what, even though the machine does. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with the changes. It would be good to remove the comments and addressing some of the minor typos/comments before merging though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reporting the address so you can put it in the SEGV addr field without decoding the instruction makes a lot of sense.
Capidx is just nice to have information that I guess could be dropped and we'd force GDB to decode the instruction instead.
@bsdjhb what do you think?
Co-authored-by: Andres Amaya Garcia <[email protected]> Signed-off-by: Tariq Kurd <[email protected]>
Co-authored-by: Andres Amaya Garcia <[email protected]> Signed-off-by: Tariq Kurd <[email protected]>
Co-authored-by: Andres Amaya Garcia <[email protected]> Signed-off-by: Tariq Kurd <[email protected]>
Signed-off-by: Tariq Kurd <[email protected]>
Co-authored-by: Andres Amaya Garcia <[email protected]> Signed-off-by: Tariq Kurd <[email protected]>
Signed-off-by: Tariq Kurd <[email protected]>
As a general comment, yes, I do think it would be useful to have the faulting address. I have need of it in a CheriBSD PR where I convert CHERI faults for hybrid processes into SIGSEGV instead of SIGPROT. The current PR works on Morello but not on CHERI RISC-V due to not having the faulting address to hand. CAPIDX is not really needed IMO. Morello doesn't provide it, and any adaptation of CHERI to x86 can never provide it (not all memory operands in x86 are in registers). In general for a debugger, a user can just look at the faulting instruction disassembly to work out which register is involved. More typically a user will be looking at the source line that faulted and printing variables anyway. If you do not need a separate error code field in |
For reference, the CheriBSD PR I mentioned: CTSRD-CHERI/cheribsd#2145 |
Hm, the SIGSEGV points are fair. In that case, I would suggest putting the address in xtval and the error code in xtval2 to avoid special casing in both hardware and software. |
It would be more regular to have the address in Xtval, certainly |
I think we may as well move it now, it does make it more regular and I suspect it would get shifted in review anyway. |
Swap the roles of xtval and xtval2 such that xtval holds the address on a CHERI fault and xtval2 holds the extra CHERI cause and type information for the exception. The following also had to change along the way: * Update old references to xtval that must now point to xtval2 * Add `htval2` because otherwise we do not get any CHERI information in HS-mode * Adjust diagrams and descriptions for xtval and xtval2 This PR is a follow up to #373 --------- Signed-off-by: Tariq Kurd <[email protected]> Signed-off-by: Andres Amaya Garcia <[email protected]> Co-authored-by: Tariq Kurd <[email protected]>
We need to have the exception faulting address when a CHERI data memory access faults. Not having so is causing problems, so this adds two fields: mtval.CAP_IDX - which cap caused the fault mtval2 - XLEN wide address which faulted (already a standard CSR) stval2 - as mtval2 --------- Signed-off-by: Tariq Kurd <[email protected]> Co-authored-by: Andres Amaya Garcia <[email protected]>
Swap the roles of xtval and xtval2 such that xtval holds the address on a CHERI fault and xtval2 holds the extra CHERI cause and type information for the exception. The following also had to change along the way: * Update old references to xtval that must now point to xtval2 * Add `htval2` because otherwise we do not get any CHERI information in HS-mode * Adjust diagrams and descriptions for xtval and xtval2 This PR is a follow up to riscv#373 --------- Signed-off-by: Tariq Kurd <[email protected]> Signed-off-by: Andres Amaya Garcia <[email protected]> Co-authored-by: Tariq Kurd <[email protected]>
We need to have the exception faulting address when a CHERI data memory access faults.
Not having so is causing problems, so this adds two fields:
mtval.CAP_IDX - which cap caused the fault
mtval2 - XLEN wide address which faulted (already a standard CSR)
stval2 - as mtval2