[RFC PATCH v5 00/18] pkeys-based page table hardening
Kevin Brodsky
kevin.brodsky at arm.com
Thu Aug 21 00:23:42 PDT 2025
On 20/08/2025 18:18, Edgecombe, Rick P wrote:
> On Wed, 2025-08-20 at 18:01 +0200, Kevin Brodsky wrote:
>> Apologies, Thunderbird helpfully decided to wrap around that table...
>> Here's the unmangled table:
>>
>> +-------------------+----------------------------------+------------------+---------------+
>>> Benchmark | Result Class | Without batching | With batching |
>> +===================+==================================+==================+===============+
>>> mmtests/kernbench | real time | 0.32% | 0.35% |
>>> | system time | (R) 4.18% | (R) 3.18% |
>>> | user time | 0.08% | 0.20% |
>> +-------------------+----------------------------------+------------------+---------------+
>>> micromm/fork | fork: h:0 | (R) 221.39% | (R) 3.35% |
>>> | fork: h:1 | (R) 282.89% | (R) 6.99% |
>> +-------------------+----------------------------------+------------------+---------------+
>>> micromm/munmap | munmap: h:0 | (R) 17.37% | -0.28% |
>>> | munmap: h:1 | (R) 172.61% | (R) 8.08% |
>> +-------------------+----------------------------------+------------------+---------------+
>>> micromm/vmalloc | fix_size_alloc_test: p:1, h:0 | (R) 15.54% | (R) 12.57% |
> Both this and the previous one have the 95% confidence interval. So it saw a 16%
> speed up with direct map modification. Possible?
Positive numbers mean performance degradation ("(R)" actually stands for
regression), so in that case the protection is adding a 16%/13%
overhead. Here this is mainly due to the added pkey register switching
(+ barrier) happening on every call to vmalloc() and vfree(), which has
a large relative impact since only one page is being allocated/freed.
>>> | fix_size_alloc_test: p:4, h:0 | (R) 39.18% | (R) 9.13% |
>>> | fix_size_alloc_test: p:16, h:0 | (R) 65.81% | 2.97% |
>>> | fix_size_alloc_test: p:64, h:0 | (R) 83.39% | -0.49% |
>>> | fix_size_alloc_test: p:256, h:0 | (R) 87.85% | (I) -2.04% |
>>> | fix_size_alloc_test: p:16, h:1 | (R) 51.21% | 3.77% |
>>> | fix_size_alloc_test: p:64, h:1 | (R) 60.02% | 0.99% |
>>> | fix_size_alloc_test: p:256, h:1 | (R) 63.82% | 1.16% |
>>> | random_size_alloc_test: p:1, h:0 | (R) 77.79% | -0.51% |
>>> | vm_map_ram_test: p:1, h:0 | (R) 30.67% | (R) 27.09% |
>> +-------------------+----------------------------------+------------------+---------------+
> Hmm, still surprisingly low to me, but ok. It would be good have x86 and arm
> work the same, but I don't think we have line of sight to x86 currently. And I
> actually never did real benchmarks.
It would certainly be good to get numbers on x86 as well - I'm hoping
that someone with a better understanding of x86 than myself could
implement kpkeys on x86 at some point, so that we can run the same
benchmarks there.
- Kevin
More information about the linux-arm-kernel
mailing list