[RFC PATCH -next v2 3/4] arm64/ftrace: support dynamically allocated trampolines
Steven Rostedt
rostedt at goodmis.org
Thu Apr 21 08:42:01 PDT 2022
On Thu, 21 Apr 2022 16:14:13 +0100
Mark Rutland <mark.rutland at arm.com> wrote:
> > Let's say you have 10 ftrace_ops registered (with bpf and kprobes this can
> > be quite common). But each of these ftrace_ops traces a function (or
> > functions) that are not being traced by the other ftrace_ops. That is, each
> > ftrace_ops has its own unique function(s) that they are tracing. One could
> > be tracing schedule, the other could be tracing ksoftirqd_should_run
> > (whatever).
>
> Ok, so that's when messing around with bpf or kprobes, and not generally
> when using plain old ftrace functionality under /sys/kernel/tracing/
> (unless that's concurrent with one of the former, as per your other
> reply) ?
It's any user of the ftrace infrastructure, which includes kprobes, bpf,
perf, function tracing, function graph tracing, and also affects instances.
>
> > Without this change, because the arch does not support dynamically
> > allocated trampolines, it means that all these ftrace_ops will be
> > registered to the same trampoline. That means, for every function that is
> > traced, it will loop through all 10 of theses ftrace_ops and check their
> > hashes to see if their callback should be called or not.
>
> Sure; I can see how that can be quite expensive.
>
> What I'm trying to figure out is who this matters to and when, since the
> implementation is going to come with a bunch of subtle/fractal
> complexities, and likely a substantial overhead too when enabling or
> disabling tracing of a patch-site. I'd like to understand the trade-offs
> better.
>
> > With dynamically allocated trampolines, each ftrace_ops will have their own
> > trampoline, and that trampoline will be called directly if the function
> > is only being traced by the one ftrace_ops. This is much more efficient.
> >
> > If a function is traced by more than one ftrace_ops, then it falls back to
> > the loop.
>
> I see -- so the dynamic trampoline is just to get the ops? Or is that
> doing additional things?
It's to get both the ftrace_ops (as that's one of the parameters) as well
as to call the callback directly. Not sure if arm is affected by spectre,
but the "loop" function is filled with indirect function calls, where as
the dynamic trampolines call the callback directly.
Instead of:
bl ftrace_caller
ftrace_caller:
[..]
bl ftrace_ops_list_func
[..]
void ftrace_ops_list_func(...)
{
__do_for_each_ftrace_ops(op, ftrace_ops_list) {
if (ftrace_ops_test(op, ip)) // test the hash to see if it
// should trace this
// function.
op->func(...);
}
}
It does:
bl dyanmic_tramp
dynamic_tramp:
[..]
bl func // call the op->func directly!
Much more efficient!
>
> There might be a middle-ground here where we patch the ftrace_ops
> pointer into a literal pool at the patch-site, which would allow us to
> handle this atomically, and would avoid the issues with out-of-range
> trampolines.
Have an example of what you are suggesting?
-- Steve
More information about the linux-arm-kernel
mailing list