[PATCH] arm64: implement support for static call trampolines

Ard Biesheuvel ardb at kernel.org
Mon Oct 19 13:23:20 EDT 2020


On Mon, 19 Oct 2020 at 19:05, Peter Zijlstra <peterz at infradead.org> wrote:
>
> On Mon, Oct 19, 2020 at 04:12:47PM +0200, Ard Biesheuvel wrote:
> > Implement arm64 support for the 'unoptimized' static call variety,
> > which routes all calls through a single trampoline that is patched
> > to perform a tail call to the selected function.
> >
> > Since static call targets may be located in modules loaded out of
> > direct branching range, we need to be able to fall back to issuing
> > a ADRP/ADD pair to load the branch target into R16 and use a BR
> > instruction. As this involves patching more than a single B or NOP
> > instruction (for which the architecture makes special provisions
> > in terms of the synchronization needed), we should take care to
> > only use aarch64_insn_patch_text_nosync() if the subsequent
> > instruction is still a 'RET' (which guarantees that the one being
> > patched is a B or a NOP)
>
> Aside of lacking objtool support (which is being worked on), is there
> anything else in the way of also doing inline patching for ARM64?
>

I implemented a GCC plugin for this a while ago [0], but I guess we'd
need to see some evidence first that it will actually make a
difference.

> That is; if the function is not reachable by the immediate you can
> always leave (re-instate) the call to the trampoline after patching
> that.
>

I don't think there is anything fundamentally preventing us from doing
that: the trampolines are guaranteed to be in range for all the
callers, as modules that are far away will have their own PLT
trampoline inside the module itself, so we can always fall back to the
trampoline if we cannot patch the branch itself. However, we might be
able to do better here, i.e., at patch time, we could detect whether
the call site points into a module trampoline, and update that one,
instead of bouncing from one trampoline to the next.

But again, we'd need to see some use cases first where it makes a
difference. And actually, the same reasoning applies to this patch - I
haven't done any performance testing, nor do I have any use cases in
mind where this sits on a hot path. The reason I have been looking at
static calls is to clean up kludges such as the one we have in
lib/crc-t10dif.c for switching to an arch optimized implementation at
runtime.

> Anyway, nice to see ARM64 support, thanks!

[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=static-calls



More information about the linux-arm-kernel mailing list