Question About Access Fault Handling in the Kernel
Vincent Chen
vincent.chen at sifive.com
Tue Jul 22 19:08:38 PDT 2025
On Thu, Jul 17, 2025 at 9:59 PM Alexandre Ghiti <alex at ghiti.fr> wrote:
>
> Hi Vincent,
>
> First sorry for the very late reply.
>
Hi Alexandre,
No problem, I'm glad to hear that someone else is also interested in this issue.
> On 6/3/25 06:25, Vincent Chen wrote:
> > Hi all,
> > I have a question regarding the current handling flow for access
> > faults in the kernel.
> >
> > In the current implementation, the kernel consistently reports tall
> > access faults using a SIGSEGV with error code SEGV_ACCERR, where
> > SEGV_ACCERR signifies "Invalid permissions for mapped object"
> > according to the POSIX specification.
> >
> > However, in the RISC-V privilege specification, it states that
> > "Implementations may raise access-fault exceptions instead of
> > address-misaligned exceptions for some misaligned accesses, indicating
> > the instruction should not be emulated by a trap handler". In other
> > words, a load/store access fault may be caused by a misaligned AMO
> > instruction. In such cases, it seems more appropriate to report the
> > error using SIGBUS with the error code BUS_ADRALN, which indicates
> > "Invalid address alignment."
>
>
> I think you're right, it seems more appropriate. But then how do we
> recognize an access fault exception that is caused by a misaligned
> access? Do you have a solution? If that does not slow down the "normal"
> access fault, I guess we can give it a try, what do you think?
>
Indeed, I do not yet have a concrete solution to this problem. My
current thought is that we may need access to the full PMA
configuration and PMP settings in order to accurately diagnose the
root cause of an access fault.
For example, when the kernel encounters an access fault with stval =
0x1000002, we cannot directly determine whether the cause is a
misaligned AMO access, a PMP violation, or some other PMA violations.
As a result, the kernel may not be able to raise an appropriate signal
with the correct error code.
>From a security perspective, exposing PMA configurations outside of
M-mode may not be appropriate. If this concern is valid, we may need
to rely on OpenSBI to assist in diagnosing such faults. However, this
would introduce additional overhead to slow down the process of the
"normal" access fault.
On the other hand, the current implementation consistently reports an
access fault via the SIGSEGV signal with the SEGV_ACCERR error code.
For me, this approach could be more precise for some cases. For
example, when the CPU executes an AMO instruction on a region marked
with the AMONone PMA attribute, reporting the fault using the SIGBUS
with the BUS_OBJERR error code might be more appropriate. However, I
have not discovered any discussions about similar issues on the
mailing list, which makes me question whether it's worth doing this
approach.
> Did you encounter this situation where you were reported a SIGSEGV but
> it was actually a SIGBUS?
>
Yes, I encountered this issue when running the lockbus test in
stress-ng version v0.18.12. This testcase intends to use the AMO
instruction to access an unaligned address and expects to get a SIGBUS
signal due to this access.
> Sorry again for the delay,
>
> Thanks,
>
> Alex
>
>
> >
> > As I understand it, most access faults are typically caused by
> > violations of either PMA or PMP settings. If these settings are
> > considered a form of permission for a mapped object, then the current
> > implementation using SIGSEGV with SEGV_ACCERR makes sense and helps
> > simplify the fault-handling logic.
> >
> > I’d like to confirm whether this interpretation reflects the reasoning
> > behind the current implementation. Any clarification would be
> > appreciated. Thanks!
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
More information about the linux-riscv
mailing list