[PATCH v2] arm64: implement support for static call trampolines

Ard Biesheuvel ardb at kernel.org
Mon Nov 16 05:31:10 EST 2020


On Mon, 16 Nov 2020 at 11:18, Quentin Perret <qperret at google.com> wrote:
>
> On Thursday 29 Oct 2020 at 11:54:42 (+0000), Quentin Perret wrote:
> > The reason I'm interested in this is because Android makes heavy use of
> > trace points/hooks, so any potential improvement in this area would be
> > welcome. Now I agree we need numbers to show the benefit is real before
> > this can be considered for inclusion in the kernel. I'll try and see if
> > we can get something.
>
> Following up on this as we've just figured out what was causing
> performance issues in our use-case. Basically, we have a setup where
> some modules attach to trace hooks for a few things (e.g. the pelt
> scheduler hooks + other Android-specific hooks), and that appeared to
> cause up ~6% perf regression on the Androbench benchmark.
>
> The bulk of the regression came from a feature that is currently
> Android-specific but should hopefully make it upstream (soon?): Control
> Flow Integrity (CFI) -- see [1] for more details. In essence CFI is a
> software-based cousin of BTI, which is basically about ensuring the
> target of an indirect function call has a compatible prototype. This can
> be relatively easily checked for potential targets that are known at
> compile-time, but is a little harder when the targets are dynamically
> loaded, hence causing extra overhead when the target is in a module.
>
> Anyway, I don't think any of the above is particularly relevant to
> upstream just yet, but I figured this would interesting to share. The
> short-term fix for Android was to locally disable CFI checks around the
> trace hooks that cause the perf regression, but I think static-calls
> would be a preferable alternative to that (I'll try to confirm that
> experimentally). And when/if CFI makes it upstream, then that may become
> relevant to upstream as well, though the integration of CFI and
> static-calls is not very clear yet.
>

OK, so that would suggest that having at least the out-of-line
trampoline would help with CFI, but only because the indirect call is
decorated with CFI checks, not because the indirect call itself is any
slower.

So that suggests that something like

  bti    c
  ldr    x16, 0f
  br     x16
0:.quad  <target>

may well be sufficient in the arm64 case - it is hidden from the
assembler, so we don't get the CFI overhead, and since it is emitted
as .text (and therefore requires code patching to be updated), it does
not need the same level of protection that CFI offers elsewhere when
it comes to indirect calls.



More information about the linux-arm-kernel mailing list