[PATCH 4/8] ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS

Li Huafei lihuafei1 at huawei.com
Wed Jan 11 22:48:45 PST 2023



On 2023/1/9 21:58, Mark Rutland wrote:
> Architectures without dynamic ftrace trampolines incur an overhead when
> multiple ftrace_ops are enabled with distinct filters. in these cases,
> each call site calls a common trampoline which uses
> ftrace_ops_list_func() to iterate over all enabled ftrace functions, and
> so incurs an overhead relative to the size of this list (including RCU
> protection overhead).
> 
> Architectures with dynamic ftrace trampolines avoid this overhead for
> call sites which have a single associated ftrace_ops. In these cases,
> the dynamic trampoline is customized to branch directly to the relevant
> ftrace function, avoiding the list overhead.
> 
> On some architectures it's impractical and/or undesirable to implement
> dynamic ftrace trampolines. For example, arm64 has limited branch ranges
> and cannot always directly branch from a call site to an arbitrary
> address (e.g. from a kernel text address to an arbitrary module
> address). Calls from modules to core kernel text can be indirected via
> PLTs (allocated at module load time) to address this, but the same is
> not possible from calls from core kernel text.
> 
> Using an indirect branch from a call site to an arbitrary trampoline is
> possible, but requires several more instructions in the function
> prologue (or immediately before it), and/or comes with far more complex
> requirements for patching.
> 
> Instead, this patch adds a new option, where an architecture can
> associate each call site with a pointer to an ftrace_ops, placed at a
> fixed offset from the call site. A shared trampoline can recover this
> pointer and call ftrace_ops::func() without needing to go via
> ftrace_ops_list_func(), avoiding the associated overhead.
> 
> This avoids issues with branch range limitations, and avoids the need to
> allocate and manipulate dynamic trampolines, making it far simpler to
> implement and maintain, while having similar performance
> characteristics.
> 
> Note that this allows for dynamic ftrace_ops to be invoked directly from
> an architecture's ftrace_caller trampoline, whereas existing code forces
> the use of ftrace_ops_get_list_func(), which is in part necessary to
> permit the ftrace_ops to be freed once unregistereed *and* to avoid
> branch/address-generation range limitation on some architectures (e.g.
> where ops->func is a module address, and may be outside of the direct
> branch range for callsites within the main kernel image).
> 
> The CALL_OPS approach avoids this problems and is safe as:
> 
> * The existing synchronization in ftrace_shutdown() using
>   ftrace_shutdown() using synchronize_rcu_tasks_rude() (and
>   synchronize_rcu_tasks()) ensures that no tasks hold a stale reference
>   to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline,
>   or while invoking ftrace_ops::func), when that ftrace_ops is
>   unregistered.
> 
>   Arguably this could also be relied upon for the existing scheme,
>   permitting dynamic ftrace_ops to be invoked directly when ops->func is
>   in range, but this will require additional logic to handle branch
>   range limitations, and is not handled by this patch.
> 
> * Each callsite's ftrace_ops pointer literal can hold any valid kernel
>   address, and is updated atomically. As an architecture's ftrace_caller
>   trampoline will atomically load the ops pointer then derefrence
>   ops->func, there is no risk of invoking ops->func with a mismatches
>   ops pointer, and updates to the ops pointer do not require special
>   care.
> 
> A subsequent patch will implement architectures support for arm64. There
> should be no functional change as a result of this patch alone.
> 
> Signed-off-by: Mark Rutland <mark.rutland at arm.com>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Florent Revest <revest at chromium.org>
> Cc: Masami Hiramatsu <mhiramat at kernel.org>
> Cc: Peter Zijlstra <peterz at infradead.org>
> Cc: Steven Rostedt <rostedt at goodmis.org>
> Cc: Will Deacon <will at kernel.org>
> ---
>  include/linux/ftrace.h |  15 +++++-
>  kernel/trace/Kconfig   |   7 +++
>  kernel/trace/ftrace.c  | 109 +++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 124 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 99f1146614c04..5eeb2776124c5 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -57,6 +57,9 @@ void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
>  void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
>  			       struct ftrace_ops *op, struct ftrace_regs *fregs);
>  #endif
> +extern const struct ftrace_ops ftrace_nop_ops;
> +extern const struct ftrace_ops ftrace_list_ops;
> +struct ftrace_ops *ftrace_find_unique_ops(struct dyn_ftrace *rec);

Hi Mark,

This patch has build issues on x86:

    CC      mm/readahead.o
  In file included from include/linux/perf_event.h:52:0,
                   from arch/x86/events/amd/lbr.c:2:
  include/linux/ftrace.h:62:50: error: ‘struct dyn_ftrace’ declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
   struct ftrace_ops *ftrace_find_unique_ops(struct dyn_ftrace *rec);

Here we should need 'struct dyn_ftrace' forward declaration.

Thanks,
Huafei

>  #endif /* CONFIG_FUNCTION_TRACER */
>  
>  /* Main tracing buffer and events set up */
> @@ -563,6 +566,8 @@ bool is_ftrace_trampoline(unsigned long addr);
>   *  IPMODIFY - the record allows for the IP address to be changed.
>   *  DISABLED - the record is not ready to be touched yet
>   *  DIRECT   - there is a direct function to call
> + *  CALL_OPS - the record can use callsite-specific ops
> + *  CALL_OPS_EN - the function is set up to use callsite-specific ops
>   *
>   * When a new ftrace_ops is registered and wants a function to save
>   * pt_regs, the rec->flags REGS is set. When the function has been
> @@ -580,9 +585,11 @@ enum {
>  	FTRACE_FL_DISABLED	= (1UL << 25),
>  	FTRACE_FL_DIRECT	= (1UL << 24),
>  	FTRACE_FL_DIRECT_EN	= (1UL << 23),
> +	FTRACE_FL_CALL_OPS	= (1UL << 22),
> +	FTRACE_FL_CALL_OPS_EN	= (1UL << 21),
>  };
>  
> -#define FTRACE_REF_MAX_SHIFT	23
> +#define FTRACE_REF_MAX_SHIFT	21
>  #define FTRACE_REF_MAX		((1UL << FTRACE_REF_MAX_SHIFT) - 1)
>  
>  #define ftrace_rec_count(rec)	((rec)->flags & FTRACE_REF_MAX)
> @@ -820,7 +827,8 @@ static inline int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>   */
>  extern int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr);
>  
> -#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +#if defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS) || \
> +	defined(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS)
>  /**
>   * ftrace_modify_call - convert from one addr to another (no nop)
>   * @rec: the call site record (e.g. mcount/fentry)
> @@ -833,6 +841,9 @@ extern int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr);
>   * what we expect it to be, and then on success of the compare,
>   * it should write to the location.
>   *
> + * When using call ops, this is called when the associated ops change, even
> + * when (addr == old_addr).
> + *
>   * The code segment at @rec->ip should be a caller to @old_addr
>   *
>   * Return must be:
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 197545241ab83..5df427a2321df 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -42,6 +42,9 @@ config HAVE_DYNAMIC_FTRACE_WITH_REGS
>  config HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>  	bool
>  
> +config HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
> +	bool
> +
>  config HAVE_DYNAMIC_FTRACE_WITH_ARGS
>  	bool
>  	help
> @@ -257,6 +260,10 @@ config DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>  	depends on DYNAMIC_FTRACE_WITH_REGS
>  	depends on HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>  
> +config DYNAMIC_FTRACE_WITH_CALL_OPS
> +	def_bool y
> +	depends on HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
> +
>  config DYNAMIC_FTRACE_WITH_ARGS
>  	def_bool y
>  	depends on DYNAMIC_FTRACE
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 442438b93fe98..c0b219ab89d2f 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -125,6 +125,33 @@ struct ftrace_ops global_ops;
>  void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
>  			  struct ftrace_ops *op, struct ftrace_regs *fregs);
>  
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS
> +/*
> + * Stub used to invoke the list ops without requiring a separate trampoline.
> + */
> +const struct ftrace_ops ftrace_list_ops = {
> +	.func	= ftrace_ops_list_func,
> +	.flags	= FTRACE_OPS_FL_STUB,
> +};
> +
> +static void ftrace_ops_nop_func(unsigned long ip, unsigned long parent_ip,
> +				struct ftrace_ops *op,
> +				struct ftrace_regs *fregs)
> +{
> +	/* do nothing */
> +}
> +
> +/*
> + * Stub used when a call site is disabled. May be called transiently by threads
> + * which have made it into ftrace_caller but haven't yet recovered the ops at
> + * the point the call site is disabled.
> + */
> +const struct ftrace_ops ftrace_nop_ops = {
> +	.func	= ftrace_ops_nop_func,
> +	.flags  = FTRACE_OPS_FL_STUB,
> +};
> +#endif
> +
>  static inline void ftrace_ops_init(struct ftrace_ops *ops)
>  {
>  #ifdef CONFIG_DYNAMIC_FTRACE
> @@ -1814,6 +1841,18 @@ static bool __ftrace_hash_rec_update(struct ftrace_ops *ops,
>  			 * if rec count is zero.
>  			 */
>  		}
> +
> +		/*
> +		 * If the rec has a single associated ops, and ops->func can be
> +		 * called directly, allow the call site to call via the ops.
> +		 */
> +		if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS) &&
> +		    ftrace_rec_count(rec) == 1 &&
> +		    ftrace_ops_get_func(ops) == ops->func)
> +			rec->flags |= FTRACE_FL_CALL_OPS;
> +		else
> +			rec->flags &= ~FTRACE_FL_CALL_OPS;
> +
>  		count++;
>  
>  		/* Must match FTRACE_UPDATE_CALLS in ftrace_modify_all_code() */
> @@ -2108,8 +2147,9 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec)
>  		struct ftrace_ops *ops = NULL;
>  
>  		pr_info("ftrace record flags: %lx\n", rec->flags);
> -		pr_cont(" (%ld)%s", ftrace_rec_count(rec),
> -			rec->flags & FTRACE_FL_REGS ? " R" : "  ");
> +		pr_cont(" (%ld)%s%s", ftrace_rec_count(rec),
> +			rec->flags & FTRACE_FL_REGS ? " R" : "  ",
> +			rec->flags & FTRACE_FL_CALL_OPS ? " O" : "  ");
>  		if (rec->flags & FTRACE_FL_TRAMP_EN) {
>  			ops = ftrace_find_tramp_ops_any(rec);
>  			if (ops) {
> @@ -2177,6 +2217,7 @@ static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update)
>  		 * want the direct enabled (it will be done via the
>  		 * direct helper). But if DIRECT_EN is set, and
>  		 * the count is not one, we need to clear it.
> +		 *
>  		 */
>  		if (ftrace_rec_count(rec) == 1) {
>  			if (!(rec->flags & FTRACE_FL_DIRECT) !=
> @@ -2185,6 +2226,19 @@ static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update)
>  		} else if (rec->flags & FTRACE_FL_DIRECT_EN) {
>  			flag |= FTRACE_FL_DIRECT;
>  		}
> +
> +		/*
> +		 * Ops calls are special, as count matters.
> +		 * As with direct calls, they must only be enabled when count
> +		 * is one, otherwise they'll be handled via the list ops.
> +		 */
> +		if (ftrace_rec_count(rec) == 1) {
> +			if (!(rec->flags & FTRACE_FL_CALL_OPS) !=
> +			    !(rec->flags & FTRACE_FL_CALL_OPS_EN))
> +				flag |= FTRACE_FL_CALL_OPS;
> +		} else if (rec->flags & FTRACE_FL_CALL_OPS_EN) {
> +			flag |= FTRACE_FL_CALL_OPS;
> +		}
>  	}
>  
>  	/* If the state of this record hasn't changed, then do nothing */
> @@ -2229,6 +2283,21 @@ static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update)
>  					rec->flags &= ~FTRACE_FL_DIRECT_EN;
>  				}
>  			}
> +
> +			if (flag & FTRACE_FL_CALL_OPS) {
> +				if (ftrace_rec_count(rec) == 1) {
> +					if (rec->flags & FTRACE_FL_CALL_OPS)
> +						rec->flags |= FTRACE_FL_CALL_OPS_EN;
> +					else
> +						rec->flags &= ~FTRACE_FL_CALL_OPS_EN;
> +				} else {
> +					/*
> +					 * Can only call directly if there's
> +					 * only one sets of associated ops.
> +					 */
> +					rec->flags &= ~FTRACE_FL_CALL_OPS_EN;
> +				}
> +			}
>  		}
>  
>  		/*
> @@ -2258,7 +2327,8 @@ static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update)
>  			 * and REGS states. The _EN flags must be disabled though.
>  			 */
>  			rec->flags &= ~(FTRACE_FL_ENABLED | FTRACE_FL_TRAMP_EN |
> -					FTRACE_FL_REGS_EN | FTRACE_FL_DIRECT_EN);
> +					FTRACE_FL_REGS_EN | FTRACE_FL_DIRECT_EN |
> +					FTRACE_FL_CALL_OPS_EN);
>  	}
>  
>  	ftrace_bug_type = FTRACE_BUG_NOP;
> @@ -2431,6 +2501,25 @@ ftrace_find_tramp_ops_new(struct dyn_ftrace *rec)
>  	return NULL;
>  }
>  
> +struct ftrace_ops *
> +ftrace_find_unique_ops(struct dyn_ftrace *rec)
> +{
> +	struct ftrace_ops *op, *found = NULL;
> +	unsigned long ip = rec->ip;
> +
> +	do_for_each_ftrace_op(op, ftrace_ops_list) {
> +
> +		if (hash_contains_ip(ip, op->func_hash)) {
> +			if (found)
> +				return NULL;
> +			found = op;
> +		}
> +
> +	} while_for_each_ftrace_op(op);
> +
> +	return found;
> +}
> +
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>  /* Protected by rcu_tasks for reading, and direct_mutex for writing */
>  static struct ftrace_hash *direct_functions = EMPTY_HASH;
> @@ -3780,11 +3869,12 @@ static int t_show(struct seq_file *m, void *v)
>  	if (iter->flags & FTRACE_ITER_ENABLED) {
>  		struct ftrace_ops *ops;
>  
> -		seq_printf(m, " (%ld)%s%s%s",
> +		seq_printf(m, " (%ld)%s%s%s%s",
>  			   ftrace_rec_count(rec),
>  			   rec->flags & FTRACE_FL_REGS ? " R" : "  ",
>  			   rec->flags & FTRACE_FL_IPMODIFY ? " I" : "  ",
> -			   rec->flags & FTRACE_FL_DIRECT ? " D" : "  ");
> +			   rec->flags & FTRACE_FL_DIRECT ? " D" : "  ",
> +			   rec->flags & FTRACE_FL_CALL_OPS ? " O" : "  ");
>  		if (rec->flags & FTRACE_FL_TRAMP_EN) {
>  			ops = ftrace_find_tramp_ops_any(rec);
>  			if (ops) {
> @@ -3800,6 +3890,15 @@ static int t_show(struct seq_file *m, void *v)
>  		} else {
>  			add_trampoline_func(m, NULL, rec);
>  		}
> +		if (rec->flags & FTRACE_FL_CALL_OPS_EN) {
> +			ops = ftrace_find_unique_ops(rec);
> +			if (ops) {
> +				seq_printf(m, "\tops: %pS (%pS)",
> +					   ops, ops->func);
> +			} else {
> +				seq_puts(m, "\tops: ERROR!");
> +			}
> +		}
>  		if (rec->flags & FTRACE_FL_DIRECT) {
>  			unsigned long direct;
>  
> 



More information about the linux-arm-kernel mailing list