[PATCH v2] arm64: implement support for static call trampolines

Ard Biesheuvel ardb at kernel.org
Thu Oct 29 06:58:52 EDT 2020


On Thu, 29 Oct 2020 at 11:40, Peter Zijlstra <peterz at infradead.org> wrote:
>
> On Wed, Oct 28, 2020 at 07:41:14PM +0100, Ard Biesheuvel wrote:
> > +/*
> > + * The static call trampoline consists of one of the following sequences:
> > + *
> > + *      (A)           (B)           (C)           (D)           (E)
> > + * 00: BTI  C        BTI  C        BTI  C        BTI  C        BTI  C
> > + * 04: B    fn       NOP           NOP           NOP           NOP
> > + * 08: RET           RET           ADRP X16, fn  ADRP X16, fn  ADRP X16, fn
> > + * 0c: NOP           NOP           ADD  X16, fn  ADD  X16, fn  ADD  X16, fn
> > + * 10:                             BR   X16      RET           NOP
> > + * 14:                                                         ADRP X16, &fn
> > + * 18:                                                         LDR  X16, [X16, &fn]
> > + * 1c:                                                         BR   X16
> > + *
> > + * The architecture permits us to patch B instructions into NOPs or vice versa
> > + * directly, but patching any other instruction sequence requires careful
> > + * synchronization. Since branch targets may be out of range for ordinary
> > + * immediate branch instructions, we may have to fall back to ADRP/ADD/BR
> > + * sequences in some cases, which complicates things considerably; since any
> > + * sleeping tasks may have been preempted right in the middle of any of these
> > + * sequences, we have to carefully transform one into the other, and ensure
> > + * that it is safe to resume execution at any point in the sequence for tasks
> > + * that have already executed part of it.
> > + *
> > + * So the rules are:
> > + * - we start out with (A) or (B)
> > + * - a branch within immediate range can always be patched in at offset 0x4;
> > + * - sequence (A) can be turned into (B) for NULL branch targets;
> > + * - a branch outside of immediate range can be patched using (C), but only if
> > + *   . the sequence being updated is (A) or (B), or
> > + *   . the branch target address modulo 4k results in the same ADD opcode
> > + *     (which could occur when patching the same far target a second time)
> > + * - once we have patched in (C) we cannot go back to (A) or (B), so patching
> > + *   in a NULL target now requires sequence (D);
> > + * - if we cannot patch a far target using (C), we fall back to sequence (E),
> > + *   which loads the function pointer from memory.
> > + *
> > + * If we abide by these rules, then the following must hold for tasks that were
> > + * interrupted halfway through execution of the trampoline:
> > + * - when resuming at offset 0x8, we can only encounter a RET if (B) or (D)
> > + *   was patched in at any point, and therefore a NULL target is valid;
> > + * - when resuming at offset 0xc, we are executing the ADD opcode that is only
> > + *   reachable via the preceding ADRP, and which is patched in only a single
> > + *   time, and is therefore guaranteed to be consistent with the ADRP target;
> > + * - when resuming at offset 0x10, X16 must refer to a valid target, since it
> > + *   is only reachable via a ADRP/ADD pair that is guaranteed to be consistent.
> > + *
> > + * Note that sequence (E) is only used when switching between multiple far
> > + * targets, and that it is not a terminal degraded state.
> > + */
>
> Would it make things easier if your trampoline consisted of two complete
> slots, between which you can flip?
>
> Something like:
>
>         0x00    B 0x24 / NOP
>         0x04    < slot 1 >
>                 ....
>         0x20
>         0x24    < slot 2 >
>                 ....
>         0x40
>
> Then each (20 byte) slot can contain any of the variants above and you
> can write the unused slot without stop-machine. Then, when the unused
> slot is populated, flip the initial instruction (like a static-branch),
> issue synchronize_rcu_tasks() and flip to using the other slot for next
> time.
>

Once we've populated a slot and activated it, we have to assume that
it is live and we can no longer modify it freely.

> Alternatively, you can patch the call-sites to point to the alternative
> trampoline slot, but that might be pushing things a bit.



More information about the linux-arm-kernel mailing list