[v5 PATCH] arm64: mm: force write fault for atomic RMW instructions

Fri Jul 5 11:51:33 PDT 2024

On Fri, 5 Jul 2024, Catalin Marinas wrote:

> There's nothing about arm64 in there and it looks like the code prefers
> MADV_POPULATE_WRITE if THPs are enabled (which is the case in all
> enterprise distros). I can't tell whether the change was made to work
> around the arm64 behaviour, there's no commit log (it was contributed by
> Ampere).

It took us a long time and numerous developers and QA teams  to get to 
this insight. You dont want to replicate this for other applications.

> There's a separate thread with the mm folk on the THP behaviour for
> pmd_none() vs pmd mapping the zero huge page but it is more portable for
> OpenJDK to use madvise() than guess the kernel behaviour and touch small
> pages or a single large pages. Even if one claims that atomic_add(0) is
> portable across operating systems, the OpenJDK code was already treating
> Linux as a special case in the presence of THP.

Other apps do not have such a vibrant developer community and no ampere 
employees contributing. They will never know and just say ARM has bad 
performance.

>> It would be much simpler to just merge the patch and be done with it.
>> Otherwise this issue will continue to cause uncountably many hours of
>> anguish for sysadmins and developers all over the Linux ecosystem trying to
>> figure out what in the world is going on with ARM.
>
> People will be happy until one enables execute-only ELF text sections in
> a distro and all that opcode parsing will add considerable overhead for
> many read faults (those with a writeable vma).

The opcode is in the l1 cache since we just faulted on it. There is no 
"considerable" overhead.

> I'd also like to understand (probably have to re-read the older threads)
> whether the overhead is caused mostly by the double fault or the actual
> breaking of a THP. For the latter, the mm folk are willing to change the
> behaviour so that pmd_none() and pmd to the zero high page are treated
> similarly (i.e. allocate a huge page on write fault). If that's good
> enough, I'd rather not merge this patch (or some form of it) and wait
> for a proper fix in hardware in the future.

THP is secondary effect here. Note that similar approaches have been 
implemented for other architectures. This is not a new approach and the 
approach is widely used on other platforms.

If those on other Linux platforms encounter this strange discussion here 
then they would come to the same conclusion that I have.

> Just to be clear, there are still potential issues to address (or
> understand the impact of) in this patch with exec-only mappings and
> the performance gain _after_ the THP behaviour changed in the mm code.
> We can make a call once we have more data but, TBH, my inclination is
> towards 'no' given that OpenJDK already support madvise() and it's not
> arm64 specific.

It is arm64 specific. Other Linux architectures have optimizations for 
similar issues in their arch code as mentioned in the patch or the 
processors will not double fault.

Is there a particular reason for ARM as processor manufacturer to oppose 
this patch? We have mostly hand waving and speculations coming from you 
here.

What the patch does is clearly beneficial and it is an established 
way of implementing read->write fault handling.