[DISCUSSION] kstack offset randomization: bugs and performance

Ryan Roberts ryan.roberts at arm.com
Mon Nov 24 09:50:14 PST 2025


On 24/11/2025 17:11, Kees Cook wrote:
> 
> 
> On November 24, 2025 6:36:25 AM PST, Will Deacon <will at kernel.org> wrote:
>> On Mon, Nov 17, 2025 at 11:31:22AM +0000, Ryan Roberts wrote:
>>> On 17/11/2025 11:30, Ryan Roberts wrote:
>>>> Could this give us a middle ground between strong-crng and
>>>> weak-timestamp-counter? Perhaps the main issue is that we need to store the
>>>> secret key for a long period?
>>>>
>>>>
>>>> Anyway, I plan to work up a series with the bugfixes and performance
>>>> improvements. I'll add the siphash approach as an experimental addition and get
>>>> some more detailed numbers for all the options. But wanted to raise it all here
>>>> first to get any early feedback.
>>
>> FWIW, I share Mark's concerns about using a counter for this. Given that
>> the feature currently appears to be both slow _and_ broken I'd vote for
>> either removing it or switching over to per-thread offsets as a first
>> step.
> 
> That it has potential weaknesses doesn't mean it should be entirely removed.
> 
>> We already have a per-task stack canary with
>> CONFIG_STACKPROTECTOR_PER_TASK so I don't understand the reluctance to
>> do something similar here.
> 
> That's not a reasonable comparison: the stack canary cannot change arbitrarily for a task or it would immediately crash on a function return. :)
> 
>> Speeding up the crypto feels like something that could happen separately.
> 
> Sure. But let's see what Ryan's patches look like. The suggested changes sound good to me.

Just to say I haven't forgotten about this; I ended up having to switch to
something more urgent. Hoping to get back to it later this week. I don't think
this is an urgent issue, so hopefully folks are ok waiting.

I propose to post whatever I end up with then we can all disscuss from there.
But the rough shape so far:

Fixes:
 - Remove choose_random_kstack_offset()
 - arch passes random into add_random_kstack_offset() (fixes migration bypass)
 - Move add_random_kstack_offset() to el0_svc()/el0_svc_compat() (before
   enabling interrupts) to fix non-preemption requirement (arm64)

Perf Improvements:
 - Based on Jeremy's prng, but buffer the 32 bits and use 6 bits per syscall (so
   cost of prng generation is amortized over 5 syscalls)
 - Reseed prng using get_random_u64() every 64K prng invocations (so cost of
   get_random_u64() is amortized over 64K*5 syscalls)
 - So while get_random_u64() still has a latency spike, it's so infrequent that
   it doesn't show up in p99.9 for my benchmarks.
 - If we want to change it to per-task, I think it's all amenable.
 - I'll leave the timer off limits for arm64.

Although I'm seeing some inconsistencies in the performance measurements, so
need to get that understood properly first.

Thanks,
Ryan


> 
> -Kees
> 
> 




More information about the linux-arm-kernel mailing list