[PATCH] arm64: mm: force write fault for atomic RMW instructions

Wed May 8 21:23:58 PDT 2024

On 5/8/24 22:45, Christoph Lameter (Ampere) wrote:
> On Wed, 8 May 2024, Anshuman Khandual wrote:
> 
>>> The atomic RMW instructions, for example, ldadd, actually does load +
>>> add + store in one instruction, it may trigger two page faults, the
>>> first fault is a read fault, the second fault is a write fault.
>>
>> It may or it will definitely create two consecutive page faults. What
>> if the second write fault never came about. In that case an writable
>> page table entry would be created unnecessarily (or even wrongfully),
>> thus breaking the CoW.
> 
> An atomic RMV will always perform a write? If there is a read fault then write fault will follow.

Alright, but the wording above in the commit message is bit misleading.

> 
>>> Some applications use atomic RMW instructions to populate memory, for
>>> example, openjdk uses atomic-add-0 to do pretouch (populate heap memory
>>
>> But why cannot normal store operation is sufficient for pre-touching
>> the heap memory, why read-modify-write (RMW) is required instead ?
> 
> Sure a regular write operation is sufficient but you would have to modify existing applications to get that done. x86 does not do a read fault on atomics so we have an issue htere.

Understood, although not being able to change an application to optimize
might not be a compelling argument on its own, but treating such atomic
operations differently in page fault path for improved performance sounds
feasible. But will probably let others weigh in on this and possible need
for parity with x86 behaviour.

> 
>> If the memory address has some valid data, it must have already reached there
>> via a previous write access, which would have caused initial CoW transition ?
>> If the memory address has no valid data to begin with, why even use RMW ?
> 
> Because the application can reasonably assume that all uninitialized data is zero and therefore it is not necessary to have a prior write access.

Alright, but again I wonder why an atomic operation is required to init
or pre-touch uninitialized data, some how it does not make sense unless
there is some more context here.

> 
>>> Some other architectures also have code inspection in page fault path,
>>> for example, SPARC and x86.
>>
>> Okay, I was about to ask, but is not calling get_user() for all data
>> read page faults increase the cost for a hot code path in general for
>> some potential savings for a very specific use case. Not sure if that
>> is worth the trade-off.
> 
> The instruction is cache hot since it must be present in the cpu cache for the fault. So the overhead is minimal.
> 

But could not a pagefault_disable()-enable() window prevent concurring
page faults for the current process thus degrading its performance.