[RFC/RFT PATCH 0/6] Improve get_random_u8() for use in randomize kstack

Ryan Roberts ryan.roberts at arm.com
Thu Nov 27 04:12:12 PST 2025


On 27/11/2025 09:22, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb at kernel.org>
> 
> Ryan reports that get_random_u16() is dominant in the performance
> profiling of syscall entry when kstack randomization is enabled [0].
> 
> This is the reason many architectures rely on a counter instead, and
> that, in turn, is the reason for the convoluted way the (pseudo-)entropy
> is gathered and recorded in a per-CPU variable.
> 
> Let's try to make the get_random_uXX() fast path faster, and switch to
> get_random_u8() so that we'll hit the slow path 2x less often. Then,
> wire it up in the syscall entry path, replacing the per-CPU variable,
> making the logic at syscall exit redundant.

I ran the same set of syscall benchmarks for this series as I've done for my 
series. 

The baseline is v6.18-rc5 with stack randomization turned *off*. So I'm showing
performance cost of turning it on without any changes to the implementation,
then the reduced performance cost of turning it on with my changes applied, and 
finally cost of turning it on with Ard's changes applied:

arm64 (AWS Graviton3):
+-----------------+--------------+-------------+---------------+-----------------+
| Benchmark       | Result Class |   v6.18-rc5 | per-task-prng | fast-get-random |
|                 |              | rndstack-on |               |                 |
+=================+==============+=============+===============+=================+
| syscall/getpid  | mean (ns)    |  (R) 15.62% |     (R) 3.43% |      (R) 11.93% |
|                 | p99 (ns)     | (R) 155.01% |     (R) 3.20% |      (R) 11.00% |
|                 | p99.9 (ns)   | (R) 156.71% |     (R) 2.93% |      (R) 11.39% |
+-----------------+--------------+-------------+---------------+-----------------+
| syscall/getppid | mean (ns)    |  (R) 14.09% |     (R) 2.12% |      (R) 10.44% |
|                 | p99 (ns)     | (R) 152.81% |         1.55% |       (R) 9.94% |
|                 | p99.9 (ns)   | (R) 153.67% |         1.77% |       (R) 9.83% |
+-----------------+--------------+-------------+---------------+-----------------+
| syscall/invalid | mean (ns)    |  (R) 13.89% |     (R) 3.32% |      (R) 10.39% |
|                 | p99 (ns)     | (R) 165.82% |     (R) 3.51% |      (R) 10.72% |
|                 | p99.9 (ns)   | (R) 168.83% |     (R) 3.77% |      (R) 11.03% |
+-----------------+--------------+-------------+---------------+-----------------+

So this fixes the tail problem. I guess get_random_u8() only takes the slow path 
every 768 calls, whereas get_random_u16() took it every 384 calls. I'm not sure 
that fully explains it though.

But it's still a 10% cost on average.

Personally I think 10% syscall cost is too much to pay for 6 bits of stack 
randomisation. 3% is better, but still higher than we would all prefer, I'm sure.

Thanks,
Ryan

> 
> [0] https://lore.kernel.org/all/dd8c37bc-795f-4c7a-9086-69e584d8ab24@arm.com/
> 
> Cc: Kees Cook <kees at kernel.org>
> Cc: Ryan Roberts <ryan.roberts at arm.com>
> Cc: Will Deacon <will at kernel.org>
> Cc: Arnd Bergmann <arnd at arndb.de>
> Cc: Jeremy Linton <jeremy.linton at arm.com>
> Cc: Catalin Marinas <Catalin.Marinas at arm.com>
> Cc: Mark Rutland <mark.rutland at arm.com>
> Cc: Jason A. Donenfeld <Jason at zx2c4.com>
> 
> Ard Biesheuvel (6):
>   hexagon: Wire up cmpxchg64_local() to generic implementation
>   arc: Wire up cmpxchg64_local() to generic implementation
>   random: Use u32 to keep track of batched entropy generation
>   random: Use a lockless fast path for get_random_uXX()
>   random: Plug race in preceding patch
>   randomize_kstack: Use get_random_u8() at entry for entropy
> 
>  arch/Kconfig                       |  9 ++--
>  arch/arc/include/asm/cmpxchg.h     |  3 ++
>  arch/hexagon/include/asm/cmpxchg.h |  4 ++
>  drivers/char/random.c              | 49 ++++++++++++++------
>  include/linux/randomize_kstack.h   | 36 ++------------
>  init/main.c                        |  1 -
>  6 files changed, 49 insertions(+), 53 deletions(-)
> 
> 
> base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d




More information about the linux-arm-kernel mailing list