[PATCH v2 4/8] ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS

Steven Rostedt rostedt at goodmis.org
Tue Jan 17 12:52:03 PST 2023


On Fri, 13 Jan 2023 18:03:51 +0000
Mark Rutland <mark.rutland at arm.com> wrote:

> Architectures without dynamic ftrace trampolines incur an overhead when
> multiple ftrace_ops are enabled with distinct filters. in these cases,
> each call site calls a common trampoline which uses
> ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
> so incurs an overhead relative to the size of this list (including RCU
> protection overhead).
> 
> Architectures with dynamic ftrace trampolines avoid this overhead for
> call sites which have a single associated ftrace_ops. In these cases,
> the dynamic trampoline is customized to branch directly to the relevant
> ftrace function, avoiding the list overhead.
> 
> On some architectures it's impractical and/or undesirable to implement
> dynamic ftrace trampolines. For example, arm64 has limited branch ranges
> and cannot always directly branch from a call site to an arbitrary
> address (e.g. from a kernel text address to an arbitrary module
> address). Calls from modules to core kernel text can be indirected via
> PLTs (allocated at module load time) to address this, but the same is
> not possible from calls from core kernel text.
> 
> Using an indirect branch from a call site to an arbitrary trampoline is
> possible, but requires several more instructions in the function
> prologue (or immediately before it), and/or comes with far more complex
> requirements for patching.
> 
> Instead, this patch adds a new option, where an architecture can
> associate each call site with a pointer to an ftrace_ops, placed at a
> fixed offset from the call site. A shared trampoline can recover this
> pointer and call ftrace_ops::func() without needing to go via
> ftrace_ops_list_func(), avoiding the associated overhead.
> 
> This avoids issues with branch range limitations, and avoids the need to
> allocate and manipulate dynamic trampolines, making it far simpler to
> implement and maintain, while having similar performance
> characteristics.
> 
> Note that this allows for dynamic ftrace_ops to be invoked directly from
> an architecture's ftrace_caller trampoline, whereas existing code forces
> the use of ftrace_ops_get_list_func(), which is in part necessary to
> permit the ftrace_ops to be freed once unregistereed *and* to avoid
> branch/address-generation range limitation on some architectures (e.g.
> where ops->func is a module address, and may be outside of the direct
> branch range for callsites within the main kernel image).
> 
> The CALL_OPS approach avoids this problems and is safe as:
> 
> * The existing synchronization in ftrace_shutdown() using
>   ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
>   synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
>   to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
>   or while invoking ftrace_ops::func), when that ftrace_ops is
>   unregistered.
> 
>   Arguably this could also be relied upon for the existing scheme,
>   permitting dynamic ftrace_ops to be invoked directly when ops->func is
>   in range, but this will require additional logic to handle branch
>   range limitations, and is not handled by this patch.
> 
> * Each callsite's ftrace_ops pointer literal can hold any valid kernel
>   address, and is updated atomically. As an architecture's ftrace_caller
>   trampoline will atomically load the ops pointer then derefrence
>   ops->func, there is no risk of invoking ops->func with a mismatches
>   ops pointer, and updates to the ops pointer do not require special
>   care.
> 
> A subsequent patch will implement architectures support for arm64. There
> should be no functional change as a result of this patch alone.
> 
> Signed-off-by: Mark Rutland <mark.rutland at arm.com>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Florent Revest <revest at chromium.org>
> Cc: Masami Hiramatsu <mhiramat at kernel.org>
> Cc: Peter Zijlstra <peterz at infradead.org>
> Cc: Steven Rostedt <rostedt at goodmis.org>
> Cc: Will Deacon <will at kernel.org>
> ---
> 

Looks good. Looking through it, I don't see any issues. Although I didn't
test it ;-)

I probably should, but in the mean time (as my tests are currently
broken)...

Reviewed-by: Steven Rostedt (Google) <rostedt at goodmis.org>

-- Steve



More information about the linux-arm-kernel mailing list