[PATCH] arm64: mm: force write fault for atomic RMW instructions
Yang Shi
yang at os.amperecomputing.com
Fri May 17 10:35:13 PDT 2024
On 5/17/24 10:25 AM, Catalin Marinas wrote:
> On Fri, May 17, 2024 at 09:30:23AM -0700, Yang Shi wrote:
>> On 5/14/24 3:39 AM, Catalin Marinas wrote:
>>> It would be good to understand why openjdk is doing this instead of a
>>> plain write. Is it because it may be racing with some other threads
>>> already using the heap? That would be a valid pattern.
>> Yes, you are right. I think I quoted the JVM justification in earlier email,
>> anyway they said "permit use of memory concurrently with pretouch".
> Ah, sorry, I missed that. This seems like a valid reason.
I should have articulated this in the commit log. Will add this in v2.
>
>>> A point Will raised was on potential ABI changes introduced by this
>>> patch. The ESR_EL1 reported to user remains the same as per the hardware
>>> spec (read-only), so from a SIGSEGV we may have some slight behaviour
>>> changes:
>>>
>>> 1. PTE invalid:
>>>
>>> a) vma is VM_READ && !VM_WRITE permission - SIGSEGV reported with
>>> ESR_EL1.WnR == 0 in sigcontext with your patch. Without this
>>> patch, the PTE is mapped as PTE_RDONLY first and a subsequent
>>> fault will report SIGSEGV with ESR_EL1.WnR == 1.
>> I think I can do something like the below conceptually:
>>
>> if is_el0_atomic_instr && !is_write_abort
>> force_write = true
>>
>> if VM_READ && !VM_WRITE && force_write == true
> Nit: write implies read, so you only need to check !write.
>
>> vm_flags = VM_READ
>> mm_flags ~= FAULT_FLAG_WRITE
>>
>> Then we just fallback to read fault. The following write fault will trigger
>> SIGSEGV with consistent ABI.
> I think this should work. So instead of reporting the write fault
> directly in case of a read-only vma, we let the core code handle the
> read fault and first and we retry the atomic instruction.
Yes, just undo the force write when vma flags don't allow it.
>
>>> b) vma is !VM_READ && !VM_WRITE permission - SIGSEGV reported with
>>> ESR_EL1.WnR == 0, so no change from current behaviour, unless we
>>> fix the patch for (1.a) to fake the WnR bit which would change the
>>> current expectations.
>>>
>>> 2. PTE valid with PTE_RDONLY - we get a normal writeable fault in
>>> hardware, no need to fix ESR_EL1 up.
>>>
>>> The patch would have to address (1) above but faking the ESR_EL1.WnR bit
>>> based on the vma flags looks a bit fragile.
>> I think we don't need to fake the ESR_EL1.WnR bit with the fallback.
> I agree, with your approach above we don't need to fake WnR.
>
>>> Similarly, we have userfaultfd that reports the fault to user. I think
>>> in scenario (1) the kernel will report UFFD_PAGEFAULT_FLAG_WRITE with
>>> your patch but no UFFD_PAGEFAULT_FLAG_WP. Without this patch, there are
>>> indeed two faults, with the second having both UFFD_PAGEFAULT_FLAG_WP
>>> and UFFD_PAGEFAULT_FLAG_WRITE set.
>> I don't quite get what the problem is. IIUC, uffd just needs a signal from
>> kernel to tell this area will be written. It seems not break the semantic.
>> Added Peter Xu in this loop, who is the uffd developer. He may shed some
>> light.
> Not really familiar with uffd but just looking at the code, if a handler
> is registered for both MODE_MISSING and MODE_WP, currently the atomic
> instruction signals a user fault without UFFD_PAGEFAULT_FLAG_WRITE (the
> do_anonymous_page() path). If the page is mapped by the uffd handler as
> the zero page, a restart of the instruction would signal
> UFFD_PAGEFAULT_FLAG_WRITE and UFFD_PAGEFAULT_FLAG_WP (the do_wp_page()
> path).
>
> With your patch, we get the equivalent of UFFD_PAGEFAULT_FLAG_WRITE on
> the first attempt, just like having a STR instruction instead of
> separate LDR + STR (as the atomics behave from a fault perspective).
>
> However, I don't think that's a problem, the uffd handler should cope
> with an STR anyway, so it's not some unexpected combination of flags.
Yes, this is what I thought.
>
More information about the linux-arm-kernel
mailing list