[DISCUSSION] kstack offset randomization: bugs and performance
Ryan Roberts
ryan.roberts at arm.com
Mon Nov 24 09:50:14 PST 2025
On 24/11/2025 17:11, Kees Cook wrote:
>
>
> On November 24, 2025 6:36:25 AM PST, Will Deacon <will at kernel.org> wrote:
>> On Mon, Nov 17, 2025 at 11:31:22AM +0000, Ryan Roberts wrote:
>>> On 17/11/2025 11:30, Ryan Roberts wrote:
>>>> Could this give us a middle ground between strong-crng and
>>>> weak-timestamp-counter? Perhaps the main issue is that we need to store the
>>>> secret key for a long period?
>>>>
>>>>
>>>> Anyway, I plan to work up a series with the bugfixes and performance
>>>> improvements. I'll add the siphash approach as an experimental addition and get
>>>> some more detailed numbers for all the options. But wanted to raise it all here
>>>> first to get any early feedback.
>>
>> FWIW, I share Mark's concerns about using a counter for this. Given that
>> the feature currently appears to be both slow _and_ broken I'd vote for
>> either removing it or switching over to per-thread offsets as a first
>> step.
>
> That it has potential weaknesses doesn't mean it should be entirely removed.
>
>> We already have a per-task stack canary with
>> CONFIG_STACKPROTECTOR_PER_TASK so I don't understand the reluctance to
>> do something similar here.
>
> That's not a reasonable comparison: the stack canary cannot change arbitrarily for a task or it would immediately crash on a function return. :)
>
>> Speeding up the crypto feels like something that could happen separately.
>
> Sure. But let's see what Ryan's patches look like. The suggested changes sound good to me.
Just to say I haven't forgotten about this; I ended up having to switch to
something more urgent. Hoping to get back to it later this week. I don't think
this is an urgent issue, so hopefully folks are ok waiting.
I propose to post whatever I end up with then we can all disscuss from there.
But the rough shape so far:
Fixes:
- Remove choose_random_kstack_offset()
- arch passes random into add_random_kstack_offset() (fixes migration bypass)
- Move add_random_kstack_offset() to el0_svc()/el0_svc_compat() (before
enabling interrupts) to fix non-preemption requirement (arm64)
Perf Improvements:
- Based on Jeremy's prng, but buffer the 32 bits and use 6 bits per syscall (so
cost of prng generation is amortized over 5 syscalls)
- Reseed prng using get_random_u64() every 64K prng invocations (so cost of
get_random_u64() is amortized over 64K*5 syscalls)
- So while get_random_u64() still has a latency spike, it's so infrequent that
it doesn't show up in p99.9 for my benchmarks.
- If we want to change it to per-task, I think it's all amenable.
- I'll leave the timer off limits for arm64.
Although I'm seeing some inconsistencies in the performance measurements, so
need to get that understood properly first.
Thanks,
Ryan
>
> -Kees
>
>
More information about the linux-arm-kernel
mailing list