[RFC PATCH v5 00/18] pkeys-based page table hardening

Thu Aug 21 00:23:42 PDT 2025

On 20/08/2025 18:18, Edgecombe, Rick P wrote:
> On Wed, 2025-08-20 at 18:01 +0200, Kevin Brodsky wrote:
>> Apologies, Thunderbird helpfully decided to wrap around that table...
>> Here's the unmangled table:
>>
>> +-------------------+----------------------------------+------------------+---------------+
>>> Benchmark         | Result Class                     | Without batching | With batching |
>> +===================+==================================+==================+===============+
>>> mmtests/kernbench | real time                        |            0.32% |         0.35% |
>>>                    | system time                      |        (R) 4.18% |     (R) 3.18% |
>>>                    | user time                        |            0.08% |         0.20% |
>> +-------------------+----------------------------------+------------------+---------------+
>>> micromm/fork      | fork: h:0                        |      (R) 221.39% |     (R) 3.35% |
>>>                    | fork: h:1                        |      (R) 282.89% |     (R) 6.99% |
>> +-------------------+----------------------------------+------------------+---------------+
>>> micromm/munmap    | munmap: h:0                      |       (R) 17.37% |        -0.28% |
>>>                    | munmap: h:1                      |      (R) 172.61% |     (R) 8.08% |
>> +-------------------+----------------------------------+------------------+---------------+
>>> micromm/vmalloc   | fix_size_alloc_test: p:1, h:0    |       (R) 15.54% |    (R) 12.57% |
> Both this and the previous one have the 95% confidence interval. So it saw a 16%
> speed up with direct map modification. Possible?

Positive numbers mean performance degradation ("(R)" actually stands for
regression), so in that case the protection is adding a 16%/13%
overhead. Here this is mainly due to the added pkey register switching
(+ barrier) happening on every call to vmalloc() and vfree(), which has
a large relative impact since only one page is being allocated/freed.

>>>                    | fix_size_alloc_test: p:4, h:0    |       (R) 39.18% |     (R) 9.13% |
>>>                    | fix_size_alloc_test: p:16, h:0   |       (R) 65.81% |         2.97% |
>>>                    | fix_size_alloc_test: p:64, h:0   |       (R) 83.39% |        -0.49% |
>>>                    | fix_size_alloc_test: p:256, h:0  |       (R) 87.85% |    (I) -2.04% |
>>>                    | fix_size_alloc_test: p:16, h:1   |       (R) 51.21% |         3.77% |
>>>                    | fix_size_alloc_test: p:64, h:1   |       (R) 60.02% |         0.99% |
>>>                    | fix_size_alloc_test: p:256, h:1  |       (R) 63.82% |         1.16% |
>>>                    | random_size_alloc_test: p:1, h:0 |       (R) 77.79% |        -0.51% |
>>>                    | vm_map_ram_test: p:1, h:0        |       (R) 30.67% |    (R) 27.09% |
>> +-------------------+----------------------------------+------------------+---------------+
> Hmm, still surprisingly low to me, but ok. It would be good have x86 and arm
> work the same, but I don't think we have line of sight to x86 currently. And I
> actually never did real benchmarks.

It would certainly be good to get numbers on x86 as well - I'm hoping
that someone with a better understanding of x86 than myself could
implement kpkeys on x86 at some point, so that we can run the same
benchmarks there.

- Kevin