[v4 PATCH] arm64: mm: force write fault for atomic RMW instructions
David Hildenbrand
david at redhat.com
Thu Jun 6 01:49:45 PDT 2024
On 05.06.24 22:37, Yang Shi wrote:
> The atomic RMW instructions, for example, ldadd, actually does load +
> add + store in one instruction, it will trigger two page faults per the
> ARM64 architecture spec, the first fault is a read fault, the second
> fault is a write fault.
>
> Some applications use atomic RMW instructions to populate memory, for
> example, openjdk uses atomic-add-0 to do pretouch (populate heap memory
> at launch time) between v18 and v22 in order to permit use of memory
> concurrently with pretouch.
>
> But the double page fault has some problems:
>
> 1. Noticeable TLB overhead. The kernel actually installs zero page with
> readonly PTE for the read fault. The write fault will trigger a
> write-protection fault (CoW). The CoW will allocate a new page and
> make the PTE point to the new page, this needs TLB invalidations. The
> tlb invalidation and the mandatory memory barriers may incur
> significant overhead, particularly on the machines with many cores.
>
> 2. Break up huge pages. If THP is on the read fault will install huge
> zero pages. The later CoW will break up the huge page and allocate
> base pages instead of huge page. The applications have to rely on
> khugepaged (kernel thread) to collapse huge pages asynchronously.
> This also incurs noticeable performance penalty.
>
> 3. 512x page faults with huge page. Due to #2, the applications have to
> have page faults for every 4K area for the write, this makes the speed
> up by using huge page actually gone.
All interesting and valid points.
As raised, the app likely really should be using MADV_POPULATE_WRITE.
Acked-by: David Hildenbrand <david at redhat.com>
--
Cheers,
David / dhildenb
More information about the linux-arm-kernel
mailing list