[PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support

Mark Rutland mark.rutland at arm.com
Wed May 25 06:38:55 PDT 2022


On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
> Add ftrace direct support for arm64.
> 
> 1. When there is custom trampoline only, replace the fentry nop to a
>    jump instruction that jumps directly to the custom trampoline.
> 
> 2. When ftrace trampoline and custom trampoline coexist, jump from
>    fentry to ftrace trampoline first, then jump to custom trampoline
>    when ftrace trampoline exits. The current unused register
>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
>    trampoline to custom trampoline.

For those of us not all that familiar with BPF, can you explain *why* you want
this? The above explains what the patch implements, but not why that's useful.

e.g. is this just to avoid the overhead of the ops list processing in the
regular ftrace code, or is the custom trampoline there to allow you to do
something special?

There is another patch series on the list from some of your colleagues which
uses dynamic trampolines to try to avoid that ops list overhead, and it's not
clear to me whether these are trying to solve the largely same problem or
something different. That other thread is at:

  https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/

... and I've added the relevant parties to CC here, since there doesn't seem to
be any overlap in the CC lists of the two threads.

In that other thread I've suggested a general approach we could follow at:
  
  https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/

As noted in that thread, I have a few concerns which equally apply here:

* Due to the limited range of BL instructions, it's not always possible to
  patch an ftrace call-site to branch to an arbitrary trampoline. The way this
  works for ftrace today relies upon knowingthe set of trampolines at
  compile-time, and allocating module PLTs for those, and that approach cannot
  work reliably for dynanically allocated trampolines.

  I'd strongly prefer to avoid custom tramplines unless they're strictly
  necessary for functional reasons, so that we can have this work reliably and
  consistently.

* If this is mostly about avoiding the ops list processing overhead, I beleive
  we can implement some custom ops support more generally in ftrace which would
  still use a common trampoline but could directly call into those custom ops.
  I would strongly prefer this over custom trampolines.

* I'm looking to minimize the set of regs ftrace saves, and never save a full
  pt_regs, since today we (incompletely) fill that with bogus values and cannot
  acquire some state reliably (e.g. PSTATE). I'd like to avoid usage of pt_regs
  unless necessary, and I don't want to add additional reliance upon that
  structure.

> Signed-off-by: Xu Kuohai <xukuohai at huawei.com>
> Acked-by: Song Liu <songliubraving at fb.com>
> ---
>  arch/arm64/Kconfig               |  2 ++
>  arch/arm64/include/asm/ftrace.h  | 12 ++++++++++++
>  arch/arm64/kernel/asm-offsets.c  |  1 +
>  arch/arm64/kernel/entry-ftrace.S | 18 +++++++++++++++---
>  4 files changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 57c4c995965f..81cc330daafc 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -177,6 +177,8 @@ config ARM64
>  	select HAVE_DYNAMIC_FTRACE
>  	select HAVE_DYNAMIC_FTRACE_WITH_REGS \
>  		if $(cc-option,-fpatchable-function-entry=2)
> +	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS \
> +		if DYNAMIC_FTRACE_WITH_REGS
>  	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
>  		if DYNAMIC_FTRACE_WITH_REGS
>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index 1494cfa8639b..14a35a5df0a1 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -78,6 +78,18 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
>  	return addr;
>  }
>  
> +#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +static inline void arch_ftrace_set_direct_caller(struct pt_regs *regs,
> +						 unsigned long addr)
> +{
> +	/*
> +	 * Place custom trampoline address in regs->orig_x0 to let ftrace
> +	 * trampoline jump to it.
> +	 */
> +	regs->orig_x0 = addr;
> +}
> +#endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */

Please, let's not abuse pt_regs::orig_x0 for this. That's at best unnecessarily
confusing, and if we really need a field to place a value like this it implies
we should add an ftrace-specific structure to hold the ftrace-specific context
information.

Thanks,
Mark.

> +
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>  struct dyn_ftrace;
>  int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 1197e7679882..b1ed0bf01c59 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -80,6 +80,7 @@ int main(void)
>    DEFINE(S_SDEI_TTBR1,		offsetof(struct pt_regs, sdei_ttbr1));
>    DEFINE(S_PMR_SAVE,		offsetof(struct pt_regs, pmr_save));
>    DEFINE(S_STACKFRAME,		offsetof(struct pt_regs, stackframe));
> +  DEFINE(S_ORIG_X0,		offsetof(struct pt_regs, orig_x0));
>    DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
>    BLANK();
>  #ifdef CONFIG_COMPAT
> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> index e535480a4069..dfe62c55e3a2 100644
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -60,6 +60,9 @@
>  	str	x29, [sp, #S_FP]
>  	.endif
>  
> +	/* Set orig_x0 to zero  */
> +	str     xzr, [sp, #S_ORIG_X0]
> +
>  	/* Save the callsite's SP and LR */
>  	add	x10, sp, #(PT_REGS_SIZE + 16)
>  	stp	x9, x10, [sp, #S_LR]
> @@ -119,12 +122,21 @@ ftrace_common_return:
>  	/* Restore the callsite's FP, LR, PC */
>  	ldr	x29, [sp, #S_FP]
>  	ldr	x30, [sp, #S_LR]
> -	ldr	x9, [sp, #S_PC]
> -
> +	ldr	x10, [sp, #S_PC]
> +
> +	ldr	x11, [sp, #S_ORIG_X0]
> +	cbz	x11, 1f
> +	/* Set x9 to parent ip before jump to custom trampoline */
> +	mov	x9,  x30
> +	/* Set lr to self ip */
> +	ldr	x30, [sp, #S_PC]
> +	/* Set x10 (used for return address) to custom trampoline */
> +	mov	x10, x11
> +1:
>  	/* Restore the callsite's SP */
>  	add	sp, sp, #PT_REGS_SIZE + 16
>  
> -	ret	x9
> +	ret	x10
>  SYM_CODE_END(ftrace_common)
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> -- 
> 2.30.2
> 



More information about the linux-arm-kernel mailing list