Question About Access Fault Handling in the Kernel

Thu Aug 7 02:13:14 PDT 2025

On 7/23/25 04:08, Vincent Chen wrote:
> On Thu, Jul 17, 2025 at 9:59 PM Alexandre Ghiti <alex at ghiti.fr> wrote:
>> Hi Vincent,
>>
>> First sorry for the very late reply.
>>
> Hi Alexandre,
>
> No problem, I'm glad to hear that someone else is also interested in this issue.
>
>> On 6/3/25 06:25, Vincent Chen wrote:
>>> Hi all,
>>> I have a question regarding the current handling flow for access
>>> faults in the kernel.
>>>
>>> In the current implementation, the kernel consistently reports tall
>>> access faults using a SIGSEGV with error code SEGV_ACCERR, where
>>> SEGV_ACCERR signifies "Invalid permissions for mapped object"
>>> according to the POSIX specification.
>>>
>>> However, in the RISC-V privilege specification, it states that
>>> "Implementations may raise access-fault exceptions instead of
>>> address-misaligned exceptions for some misaligned accesses, indicating
>>> the instruction should not be emulated by a trap handler". In other
>>> words, a load/store access fault may be caused by a misaligned AMO
>>> instruction. In such cases, it seems more appropriate to report the
>>> error using SIGBUS with the error code BUS_ADRALN, which indicates
>>> "Invalid address alignment."
>>
>> I think you're right, it seems more appropriate. But then how do we
>> recognize an access fault exception that is caused by a misaligned
>> access? Do you have a solution? If that does not slow down the "normal"
>> access fault, I guess we can give it a try, what do you think?
>>
> Indeed, I do not yet have a concrete solution to this problem. My
> current thought is that we may need access to the full PMA
> configuration and PMP settings in order to accurately diagnose the
> root cause of an access fault.
>
> For example, when the kernel encounters an access fault with stval =
> 0x1000002, we cannot directly determine whether the cause is a
> misaligned AMO access, a PMP violation, or some other PMA violations.
> As a result, the kernel may not be able to raise an appropriate signal
> with the correct error code.
>
>  From a security perspective, exposing PMA configurations outside of
> M-mode may not be appropriate. If this concern is valid, we may need
> to rely on OpenSBI to assist in diagnosing such faults. However, this
> would introduce additional overhead to slow down the process of the
> "normal" access fault.
>
> On the other hand, the current implementation consistently reports an
> access fault via the SIGSEGV signal with the SEGV_ACCERR error code.
> For me, this approach could be more precise for some cases. For
> example, when the CPU executes an AMO instruction on a region marked
> with the AMONone PMA attribute, reporting the fault using the SIGBUS
> with the BUS_OBJERR error code might be more appropriate. However, I
> have not discovered any discussions about similar issues on the
> mailing list, which makes me question whether it's worth doing this
> approach.

I don't think we need PMA/PMP to distinguish all possible access fault 
errors, it seems more like a platform issue to me. I mean do not expose 
an AMONone region as normal RAM with a normal distro, you must that your 
platform will need specific userspace and kernel right?

For your initial problem, I think that for userspace access faults, we 
can try to find a VMA that contains stval, if it exists, that's a 
misaligned access fault, otherwise that's a "normal" access fault. Would 
that work?

>
>> Did you encounter this situation where you were reported a SIGSEGV but
>> it was actually a SIGBUS?
>>
> Yes, I encountered this issue when running the lockbus test in
> stress-ng version v0.18.12. This testcase intends to use the AMO
> instruction to access an unaligned address and expects to get a SIGBUS
> signal due to this access.
>
>> Sorry again for the delay,
>>
>> Thanks,
>>
>> Alex
>>
>>
>>> As I understand it, most access faults are typically caused by
>>> violations of either PMA or PMP settings. If these settings are
>>> considered a form of permission for a mapped object, then the current
>>> implementation using SIGSEGV with SEGV_ACCERR makes sense and helps
>>> simplify the fault-handling logic.
>>>
>>> I’d like to confirm whether this interpretation reflects the reasoning
>>> behind the current implementation. Any clarification would be
>>> appreciated. Thanks!
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv