[v5 PATCH] arm64: mm: force write fault for atomic RMW instructions

Yang Shi yang at os.amperecomputing.com
Wed Jul 10 11:43:18 PDT 2024



On 7/10/24 2:22 AM, Catalin Marinas wrote:
> On Tue, Jul 09, 2024 at 03:29:58PM -0700, Yang Shi wrote:
>> On 7/9/24 11:35 AM, Catalin Marinas wrote:
>>> On Tue, Jul 09, 2024 at 10:56:55AM -0700, Yang Shi wrote:
>>>> On 7/4/24 3:03 AM, Catalin Marinas wrote:
>>>> I tested exec-only on QEMU tcg, but I don't have a hardware supported EPAN.
>>>> I don't think performance benchmark on QEMU tcg makes sense since it is
>>>> quite slow, such small overhead is unlikely measurable on it.
>>> Yeah, benchmarking under qemu is pointless. I think you can remove some
>>> of the ARM64_HAS_EPAN checks (or replaced them with ARM64_HAS_PAN) just
>>> for testing. For security reason, we removed this behaviour in commit
>>> 24cecc377463 ("arm64: Revert support for execute-only user mappings")
>>> but it's good enough for testing. This should give you PROT_EXEC-only
>>> mappings on your hardware.
>> Thanks for the suggestion. IIUC, I still can emulate exec-only even though
>> hardware doesn't support EPAN? So it means reading exec-only area in kernel
>> still can trigger fault, right?
> Yes, it's been supported since ARMv8.0. We limited it to EPAN only since
> setting a PROT_EXEC mapping still allowed the kernel to access the
> memory even if PSTATE.PAN was set.
>
>> And 24cecc377463 ("arm64: Revert support for execute-only user mappings")
>> can't be reverted cleanly by git revert, so I did it manually as below.
> Yeah, I wasn't expecting that to work.
>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 6a8b71917e3b..0bdedd415e56 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -573,8 +573,8 @@ static int __kprobes do_page_fault(unsigned long far,
>> unsigned long esr,
>>                  /* Write implies read */
>>                  vm_flags |= VM_WRITE;
>>                  /* If EPAN is absent then exec implies read */
>> -               if (!alternative_has_cap_unlikely(ARM64_HAS_EPAN))
>> -                       vm_flags |= VM_EXEC;
>> +               //if (!alternative_has_cap_unlikely(ARM64_HAS_EPAN))
>> +               //      vm_flags |= VM_EXEC;
>>          }
>>
>>          if (is_ttbr0_addr(addr) && is_el1_permission_fault(addr, esr, regs))
>> {
>> diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
>> index 642bdf908b22..d30265d424e4 100644
>> --- a/arch/arm64/mm/mmap.c
>> +++ b/arch/arm64/mm/mmap.c
>> @@ -19,7 +19,7 @@ static pgprot_t protection_map[16] __ro_after_init = {
>>          [VM_WRITE]                                      = PAGE_READONLY,
>>          [VM_WRITE | VM_READ]                            = PAGE_READONLY,
>>          /* PAGE_EXECONLY if Enhanced PAN */
>> -       [VM_EXEC]                                       = PAGE_READONLY_EXEC,
>> +       [VM_EXEC]                                       = PAGE_EXECONLY,
>>          [VM_EXEC | VM_READ]                             = PAGE_READONLY_EXEC,
>>          [VM_EXEC | VM_WRITE]                            = PAGE_READONLY_EXEC,
>>          [VM_EXEC | VM_WRITE | VM_READ]                  = PAGE_READONLY_EXEC,
> In theory you'd need to change the VM_SHARED | VM_EXEC entry as well.
> Otherwise it looks fine.

Thanks. I just ran the same benchmark. Ran the modified 
page_fault1_thread (trigger read fault) in 100 iterations with 160 
threads on 160 cores. This should be the worst contention case and 
collected the max data (worst latency). It shows the patch may incur 
~30% overhead for exec-only case. The overhead should just come from the 
permission fault.

     N           Min           Max        Median           Avg Stddev
x 100        163840        219083        184471        183262 12593.229
+ 100        211198        285947        233608     238819.98 15253.967
Difference at 95.0% confidence
     55558 +/- 3877
     30.3161% +/- 2.11555%

This is a very extreme benchmark, I don't think any real life workload 
will spend that much time (sys vs user) in page fault, particularly read 
fault.

With my atomic fault benchmark (populate 1G memory with atomic 
instruction then manipulate the value stored in the memory in 100 
iterations so the user time is much longer than sys time), I saw around 
13% overhead on sys time due to the permission fault, but no noticeable 
change for user and real time.

So the permission fault does incur noticeable overhead for read fault on 
exec-only, but it may be not that bad for real life workloads.

>




More information about the linux-arm-kernel mailing list