Question About Access Fault Handling in the Kernel
Alexandre Ghiti
alex at ghiti.fr
Tue Aug 12 02:32:48 PDT 2025
On 8/12/25 10:44, Vincent Chen wrote:
> On Thu, Aug 7, 2025 at 5:13 PM Alexandre Ghiti <alex at ghiti.fr> wrote:
>>
>> On 7/23/25 04:08, Vincent Chen wrote:
>>> On Thu, Jul 17, 2025 at 9:59 PM Alexandre Ghiti <alex at ghiti.fr> wrote:
>>>> Hi Vincent,
>>>>
>>>> First sorry for the very late reply.
>>>>
>>> Hi Alexandre,
>>>
>>> No problem, I'm glad to hear that someone else is also interested in this issue.
>>>
>>>> On 6/3/25 06:25, Vincent Chen wrote:
>>>>> Hi all,
>>>>> I have a question regarding the current handling flow for access
>>>>> faults in the kernel.
>>>>>
>>>>> In the current implementation, the kernel consistently reports tall
>>>>> access faults using a SIGSEGV with error code SEGV_ACCERR, where
>>>>> SEGV_ACCERR signifies "Invalid permissions for mapped object"
>>>>> according to the POSIX specification.
>>>>>
>>>>> However, in the RISC-V privilege specification, it states that
>>>>> "Implementations may raise access-fault exceptions instead of
>>>>> address-misaligned exceptions for some misaligned accesses, indicating
>>>>> the instruction should not be emulated by a trap handler". In other
>>>>> words, a load/store access fault may be caused by a misaligned AMO
>>>>> instruction. In such cases, it seems more appropriate to report the
>>>>> error using SIGBUS with the error code BUS_ADRALN, which indicates
>>>>> "Invalid address alignment."
>>>> I think you're right, it seems more appropriate. But then how do we
>>>> recognize an access fault exception that is caused by a misaligned
>>>> access? Do you have a solution? If that does not slow down the "normal"
>>>> access fault, I guess we can give it a try, what do you think?
>>>>
>>> Indeed, I do not yet have a concrete solution to this problem. My
>>> current thought is that we may need access to the full PMA
>>> configuration and PMP settings in order to accurately diagnose the
>>> root cause of an access fault.
>>>
>>> For example, when the kernel encounters an access fault with stval =
>>> 0x1000002, we cannot directly determine whether the cause is a
>>> misaligned AMO access, a PMP violation, or some other PMA violations.
>>> As a result, the kernel may not be able to raise an appropriate signal
>>> with the correct error code.
>>>
>>> From a security perspective, exposing PMA configurations outside of
>>> M-mode may not be appropriate. If this concern is valid, we may need
>>> to rely on OpenSBI to assist in diagnosing such faults. However, this
>>> would introduce additional overhead to slow down the process of the
>>> "normal" access fault.
>>>
>>> On the other hand, the current implementation consistently reports an
>>> access fault via the SIGSEGV signal with the SEGV_ACCERR error code.
>>> For me, this approach could be more precise for some cases. For
>>> example, when the CPU executes an AMO instruction on a region marked
>>> with the AMONone PMA attribute, reporting the fault using the SIGBUS
>>> with the BUS_OBJERR error code might be more appropriate. However, I
>>> have not discovered any discussions about similar issues on the
>>> mailing list, which makes me question whether it's worth doing this
>>> approach.
>>
>> I don't think we need PMA/PMP to distinguish all possible access fault
>> errors, it seems more like a platform issue to me. I mean do not expose
>> an AMONone region as normal RAM with a normal distro, you must that your
>> platform will need specific userspace and kernel right?
>>
>> For your initial problem, I think that for userspace access faults, we
>> can try to find a VMA that contains stval, if it exists, that's a
>> misaligned access fault, otherwise that's a "normal" access fault. Would
>> that work?
>>
> I agree that most problems reported as access faults can be
> categorized as platform issues. We can assume that the entire Linux
> environment operates within a memory region configured with the
> correct PMP and PMA settings, and that Linux does not create any
> VA-to-PA mapping where the PA is inaccessible.
>
> Based on this assumption, I think your idea makes sense. However, I
> have another question regarding it. If I understand correctly, an
> access fault exception occurs due to a PMA/PMP violation. This also
> implies that the CPU has already verified that the VA and its
> permissions for this access are valid. Therefore, I currently cannot
> think of a case where the CPU raises an access fault but we cannot
> find a VMA that contains this stval. In such a case, I would expect
> the CPU to raise a page fault exception rather than an access fault.
> Am I overlooking any scenarios?
That would mean that, under the assumptions above, the only way to get
an access fault is from a misaligned access. But then I'm wondering if
assuming the kernel does not map inaccessible physical memory is
correct, I mean we could miss some kernel bugs (for example, wrong
mapping of devices, devices memory...etc).
But anyway, it feels like at least we should always SIGBUS in case of an
access fault right? The specific error code may be more complicated to
get, we could check VM_IO or make sure that the physical address exists
in the physical address mapping...etc. But it sounds "dirty"...
>
>>>> Did you encounter this situation where you were reported a SIGSEGV but
>>>> it was actually a SIGBUS?
>>>>
>>> Yes, I encountered this issue when running the lockbus test in
>>> stress-ng version v0.18.12. This testcase intends to use the AMO
>>> instruction to access an unaligned address and expects to get a SIGBUS
>>> signal due to this access.
>>>
>>>> Sorry again for the delay,
>>>>
>>>> Thanks,
>>>>
>>>> Alex
>>>>
>>>>
>>>>> As I understand it, most access faults are typically caused by
>>>>> violations of either PMA or PMP settings. If these settings are
>>>>> considered a form of permission for a mapped object, then the current
>>>>> implementation using SIGSEGV with SEGV_ACCERR makes sense and helps
>>>>> simplify the fault-handling logic.
>>>>>
>>>>> I’d like to confirm whether this interpretation reflects the reasoning
>>>>> behind the current implementation. Any clarification would be
>>>>> appreciated. Thanks!
>>>>>
>>>>> _______________________________________________
>>>>> linux-riscv mailing list
>>>>> linux-riscv at lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
More information about the linux-riscv
mailing list