[PATCH] arm64: mm: force write fault for atomic RMW instructions

Tue May 14 08:57:58 PDT 2024

On 14.05.24 12:39, Catalin Marinas wrote:
> On Fri, May 10, 2024 at 10:13:02AM -0700, Yang Shi wrote:
>> On 5/10/24 5:11 AM, Catalin Marinas wrote:
>>> On Tue, May 07, 2024 at 03:35:58PM -0700, Yang Shi wrote:
>>>> The atomic RMW instructions, for example, ldadd, actually does load +
>>>> add + store in one instruction, it may trigger two page faults, the
>>>> first fault is a read fault, the second fault is a write fault.
>>>>
>>>> Some applications use atomic RMW instructions to populate memory, for
>>>> example, openjdk uses atomic-add-0 to do pretouch (populate heap memory
>>>> at launch time) between v18 and v22.
>>> I'd also argue that this should be optimised in openjdk. Is an LDADD
>>> more efficient on your hardware than a plain STR? I hope it only does
>>> one operation per page rather than per long. There's also MAP_POPULATE
>>> that openjdk can use to pre-fault the pages with no additional fault.
>>> This would be even more efficient than any store or atomic operation.
>>
>> It is not about whether atomic is more efficient than plain store on our
>> hardware or not. It is arch-independent solution used by openjdk.
> 
> It may be arch independent but it's not a great choice. If you run this
> on pre-LSE atomics hardware (ARMv8.0), this operation would involve
> LDXR+STXR and there's no way for the kernel to "upgrade" it to a write
> operation on the first LDXR fault.
> 
> It would be good to understand why openjdk is doing this instead of a
> plain write. Is it because it may be racing with some other threads
> already using the heap? That would be a valid pattern.

Maybe openjdk should be switching to MADV_POPULATE_WRITE. QEMU did that 
for the preallocate/populate use case.

-- 
Cheers,

David / dhildenb