Question About Access Fault Handling in the Kernel
Vincent Chen
vincent.chen at sifive.com
Tue Aug 12 01:44:30 PDT 2025
On Thu, Aug 7, 2025 at 5:13 PM Alexandre Ghiti <alex at ghiti.fr> wrote:
>
>
> On 7/23/25 04:08, Vincent Chen wrote:
> > On Thu, Jul 17, 2025 at 9:59 PM Alexandre Ghiti <alex at ghiti.fr> wrote:
> >> Hi Vincent,
> >>
> >> First sorry for the very late reply.
> >>
> > Hi Alexandre,
> >
> > No problem, I'm glad to hear that someone else is also interested in this issue.
> >
> >> On 6/3/25 06:25, Vincent Chen wrote:
> >>> Hi all,
> >>> I have a question regarding the current handling flow for access
> >>> faults in the kernel.
> >>>
> >>> In the current implementation, the kernel consistently reports tall
> >>> access faults using a SIGSEGV with error code SEGV_ACCERR, where
> >>> SEGV_ACCERR signifies "Invalid permissions for mapped object"
> >>> according to the POSIX specification.
> >>>
> >>> However, in the RISC-V privilege specification, it states that
> >>> "Implementations may raise access-fault exceptions instead of
> >>> address-misaligned exceptions for some misaligned accesses, indicating
> >>> the instruction should not be emulated by a trap handler". In other
> >>> words, a load/store access fault may be caused by a misaligned AMO
> >>> instruction. In such cases, it seems more appropriate to report the
> >>> error using SIGBUS with the error code BUS_ADRALN, which indicates
> >>> "Invalid address alignment."
> >>
> >> I think you're right, it seems more appropriate. But then how do we
> >> recognize an access fault exception that is caused by a misaligned
> >> access? Do you have a solution? If that does not slow down the "normal"
> >> access fault, I guess we can give it a try, what do you think?
> >>
> > Indeed, I do not yet have a concrete solution to this problem. My
> > current thought is that we may need access to the full PMA
> > configuration and PMP settings in order to accurately diagnose the
> > root cause of an access fault.
> >
> > For example, when the kernel encounters an access fault with stval =
> > 0x1000002, we cannot directly determine whether the cause is a
> > misaligned AMO access, a PMP violation, or some other PMA violations.
> > As a result, the kernel may not be able to raise an appropriate signal
> > with the correct error code.
> >
> > From a security perspective, exposing PMA configurations outside of
> > M-mode may not be appropriate. If this concern is valid, we may need
> > to rely on OpenSBI to assist in diagnosing such faults. However, this
> > would introduce additional overhead to slow down the process of the
> > "normal" access fault.
> >
> > On the other hand, the current implementation consistently reports an
> > access fault via the SIGSEGV signal with the SEGV_ACCERR error code.
> > For me, this approach could be more precise for some cases. For
> > example, when the CPU executes an AMO instruction on a region marked
> > with the AMONone PMA attribute, reporting the fault using the SIGBUS
> > with the BUS_OBJERR error code might be more appropriate. However, I
> > have not discovered any discussions about similar issues on the
> > mailing list, which makes me question whether it's worth doing this
> > approach.
>
>
> I don't think we need PMA/PMP to distinguish all possible access fault
> errors, it seems more like a platform issue to me. I mean do not expose
> an AMONone region as normal RAM with a normal distro, you must that your
> platform will need specific userspace and kernel right?
>
> For your initial problem, I think that for userspace access faults, we
> can try to find a VMA that contains stval, if it exists, that's a
> misaligned access fault, otherwise that's a "normal" access fault. Would
> that work?
>
I agree that most problems reported as access faults can be
categorized as platform issues. We can assume that the entire Linux
environment operates within a memory region configured with the
correct PMP and PMA settings, and that Linux does not create any
VA-to-PA mapping where the PA is inaccessible.
Based on this assumption, I think your idea makes sense. However, I
have another question regarding it. If I understand correctly, an
access fault exception occurs due to a PMA/PMP violation. This also
implies that the CPU has already verified that the VA and its
permissions for this access are valid. Therefore, I currently cannot
think of a case where the CPU raises an access fault but we cannot
find a VMA that contains this stval. In such a case, I would expect
the CPU to raise a page fault exception rather than an access fault.
Am I overlooking any scenarios?
>
> >
> >> Did you encounter this situation where you were reported a SIGSEGV but
> >> it was actually a SIGBUS?
> >>
> > Yes, I encountered this issue when running the lockbus test in
> > stress-ng version v0.18.12. This testcase intends to use the AMO
> > instruction to access an unaligned address and expects to get a SIGBUS
> > signal due to this access.
> >
> >> Sorry again for the delay,
> >>
> >> Thanks,
> >>
> >> Alex
> >>
> >>
> >>> As I understand it, most access faults are typically caused by
> >>> violations of either PMA or PMP settings. If these settings are
> >>> considered a form of permission for a mapped object, then the current
> >>> implementation using SIGSEGV with SEGV_ACCERR makes sense and helps
> >>> simplify the fault-handling logic.
> >>>
> >>> I’d like to confirm whether this interpretation reflects the reasoning
> >>> behind the current implementation. Any clarification would be
> >>> appreciated. Thanks!
> >>>
> >>> _______________________________________________
> >>> linux-riscv mailing list
> >>> linux-riscv at lists.infradead.org
> >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
More information about the linux-riscv
mailing list