[PATCH 5/6] sched/preempt: add PREEMPT_DYNAMIC using static keys

Frederic Weisbecker frederic at kernel.org
Mon Dec 13 14:05:01 PST 2021


On Tue, Nov 09, 2021 at 05:24:07PM +0000, Mark Rutland wrote:
> Where an architecture selects HAVE_STATIC_CALL but not
> HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
> which will either branch to a callee or return to the caller.
> 
> On such architectures, a number of constraints can conspire to make
> those trampolines more complicated and potentially less useful than we'd
> like. For example:
> 
> * Hardware and software control flow integrity schemes can require the
>   additition of "landing pad" instructions (e.g. `BTI` for arm64), which
>   will also be present at the "real" callee.
> 
> * Limited branch ranges can require that trampolines generate or load an
>   address into a registter and perform an indirect brach (or at least
>   have a slow path that does so). This loses some of the benefits of
>   having a direct branch.
> 
> * Interaction with SW CFI schemes can be complicated and fragile, e.g.
>   requiring that we can recognise idiomatic codegen and remove
>   indirections understand, at least until clang proves more helpful
>   mechanisms for dealing with this.
> 
> For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
> really only need to enable/disable specific preemption functions. We can
> achieve the same effect without a number of the pain points above by
> using static keys to fold early return cases into the preemption
> functions themselves rather than in an out-of-line trampoline,
> effectively inlining the trampoline into the start of the function.
> 
> For arm64, this results in good code generation, e.g. the
> dynamic_cond_resched() wrapper looks as follows (with the first `B` being
> replaced with a `NOP` when the function is disabled):
> 
> | <dynamic_cond_resched>:
> |        bti     c
> |        b       <dynamic_cond_resched+0x10>
> |        mov     w0, #0x0                        // #0
> |        ret
> |        mrs     x0, sp_el0
> |        ldr     x0, [x0, #8]
> |        cbnz    x0, <dynamic_cond_resched+0x8>
> |        paciasp
> |        stp     x29, x30, [sp, #-16]!
> |        mov     x29, sp
> |        bl      <preempt_schedule_common>
> |        mov     w0, #0x1                        // #1
> |        ldp     x29, x30, [sp], #16
> |        autiasp
> |        ret
> 
> ... compared to the regular form of the function:
> 
> | <__cond_resched>:
> |        bti     c
> |        mrs     x0, sp_el0
> |        ldr     x1, [x0, #8]
> |        cbz     x1, <__cond_resched+0x18>
> |        mov     w0, #0x0                        // #0
> |        ret
> |        paciasp
> |        stp     x29, x30, [sp, #-16]!
> |        mov     x29, sp
> |        bl      <preempt_schedule_common>
> |        mov     w0, #0x1                        // #1
> |        ldp     x29, x30, [sp], #16
> |        autiasp
> |        ret
> 
> Any architecture which implements static keys should be able to use this
> to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
> calls.
> 
> Signed-off-by: Mark Rutland <mark.rutland at arm.com>
> Cc: Ard Biesheuvel <ardb at kernel.org>
> Cc: Frederic Weisbecker <frederic at kernel.org>
> Cc: Ingo Molnar <mingo at redhat.com>
> Cc: Juri Lelli <juri.lelli at redhat.com>
> Cc: Peter Zijlstra <peterz at infradead.org>

Anyone has an opinion on that? Can we do better on the arm64 static call side
or should we resign ourself to using that static keys direction?

Also I assume that, sooner or later, arm64 will eventually need a static call
implementation....

Thanks.



More information about the linux-arm-kernel mailing list