[PATCH] arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR

Mark Rutland mark.rutland at arm.com
Thu Oct 28 06:45:42 PDT 2021


Hi Will,

Looks like there are a bunch of things that warrant a v2, so I'll go
prepare that. It also seems there are some latent issue hit by the
ftrace selftests, which I'm taking a look at now.

On Thu, Oct 28, 2021 at 09:52:25AM +0100, Will Deacon wrote:
> On Wed, Oct 27, 2021 at 02:25:28PM +0100, Mark Rutland wrote:
> > When CONFIG_FUNCTION_GRAPH_TRACER is selected, and the function graph:
> 
> Weird ':' on the end of "graph"

Whoops; I'll go fix that with the other typos.

[...]

> > The underlying problem is that ftrace_graph_get_ret_stack() takes an
> > index offset from the most recent entry added to the fgraph return
> > stack. We start an unwind at offset 0, and increment the offset each
> > time we encounter a rewritten return address (i.e. when we see
> > `return_to_handler`). This is broken in two cases:
> > 
> > 1) Between creating a pt_regs and starting the unwind, function calls
> >    may place entries on the stack, leaving an abitrary offset which we
> >    can only determine by performing a full unwind from the caller of the
> >    unwind code (and relying on none of the unwind code being
> >    instrumented). While this initial unwind is open-coded in
> >    dump_backtrace(), this is not performed for other unwinders such as
> >    perf_callchain_kernel().
> 
> What is the user-visible impact of this?

Kernel callchains reported by perf may contain erroneous entries.

Case 2 means that dump_backtrace() may report entries in the wrong location in
some cases.

I'll see if I can get a clear example to put in the commit message, and I'll
explicitly mention those cases.

[...]

> > This doesn't fix the issue of skipping a live LR at an exception
> > boundary, which is a more general problem and requires more substantial
> > rework. Note that were we to consume the LR in all cases this would in
> 
> would in?
> 
> > warnings where the interrupted context's LR contains
> > `return_to_handler`, but the FP has been altered, e.g.
> > 
> > | func:
> > |	<--- ftrace entry ---> 	// logs FP & LR, rewrites FP
> > | 	STP	FP, LR, [SP, #-16]!
> > | 	MOV	FP, SP
> > | 	<--- INTERRUPT --->
> > 
> > ... as ftrace_graph_get_ret_stack() fill not find a matching entry,
> > triggering the WARN_ON_ONCE() in unwind_frame().
> 
> Did you run the ftrace selftests with this? If not, please can you do it?

I had not; I'd done a bunch of ad-hoc testing to deliberately trigger the
failing cases with a magic-sysrq backtrace (with that unwinder patched to start
from the regs).

I had a go with the selftests and there are some latent issues which show up on
a pristine v5.15-rc4 (or v5.15-rc7), using defconfig + the ftrace selftest
config fragment, including an OOM that could be a memory leak.

I see the following warning:

| [15] Generic dynamic event - check if duplicate events are caught
| ------------[ cut here ]------------
| WARNING: CPU: 24 PID: 912 at kernel/trace/trace_eprobe.c:853 __trace_eprobe_create+0x3ec/0x71c
| Modules linked in:
| CPU: 24 PID: 912 Comm: ftracetest Not tainted 5.15.0-rc4 #1
| Hardware name: linux,dummy-virt (DT)
| pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : __trace_eprobe_create+0x3ec/0x71c
| lr : __trace_eprobe_create+0x3ec/0x71c
| sp : ffff80001370bb60
| x29: ffff80001370bb60 x28: ffff5ee1944cc299 x27: ffff80001370bc28
| x26: 00000000000001c8 x25: ffffa8db50eece78 x24: 0000000000000010
| x23: ffffa8db50eece48 x22: 00000000ffffffed x21: 0000000000000003
| x20: ffff5ee1944cc008 x19: ffffa8db4fd2e398 x18: 0000000000000000
| x17: 7473753a29656d61 x16: 6e656c6966242830 x15: 2b3d656c69662074
| x14: 616e65706f5f7265 x13: 676e69727473753a x12: 29656d616e656c69
| x11: 0000000000000003 x10: 0101010101010101 x9 : ffffa8db4edda690
| x8 : 7f7f7f7f7f7f7f7f x7 : 726b6b6062727872 x6 : 1a171a17020d001f
| x5 : 1f000d02171a171a x4 : ffff5ee18d2b3900 x3 : ffffa8db50eece48
| x2 : 0000000000000000 x1 : ffff5ee18d2b3900 x0 : ffff5ee18d2b3900
| Call trace:
|  __trace_eprobe_create+0x3ec/0x71c
|  trace_probe_create+0x90/0xb0
|  eprobe_dyn_event_create+0x1c/0x30
|  create_dyn_event+0x74/0x114
|  trace_parse_run_command+0x104/0x7d0
|  dyn_event_write+0x1c/0x30
|  vfs_write+0xfc/0x290
|  ksys_write+0x74/0x100
|  __arm64_sys_write+0x28/0x3c
|  invoke_syscall+0x50/0x120
|  el0_svc_common.constprop.0+0x104/0x124
|  do_el0_svc+0x34/0x9c
|  el0_svc+0x2c/0x90
|  el0t_64_sync_handler+0xa8/0x12c
|  el0t_64_sync+0x1a0/0x1a4
| ---[ end trace 24f9a9e45b173e0b ]---
|         [FAIL]

That seems to be the following in __trace_eprobe_create():

| mutex_lock(&event_mutex);
| event_call = find_and_get_event(sys_name, sys_event);
| ep = alloc_event_probe(group, event, event_call, argc - 2);
| mutex_unlock(&event_mutex);
| 
| if (IS_ERR(ep)) {
| 	ret = PTR_ERR(ep);
| 	/* This must return -ENOMEM, else there is a bug */
| 	WARN_ON_ONCE(ret != -ENOMEM);
| 	ep = NULL;
| 	goto error;
| }

... but I haven't yet dug into exactly why that's returning an
unexpected error code.
 
Later the tests hang in:

| [19] event tracing - enable/disable with subsystem level files

... and from trying:

| # ./ftracetest -vvv test.d/event/subsystem-enable.tc 

... it seems to hang after the usual reset, on the first part of the
test, with the last output being:

| + . /root/ftrace/test.d/event/subsystem-enable.tc
| ++ echo 'sched:*'
| ++ yield
| ++ ping 127.0.0.1 -c 1
| PING 127.0.0.1 (127.0.0.1): 56 data bytes
| 64 bytes from 127.0.0.1: seq=0 ttl=64 time=2.442 ms
| 
| --- 127.0.0.1 ping statistics ---
| 1 packets transmitted, 1 packets received, 0% packet loss
| round-trip min/avg/max = 2.442/2.442/2.442 ms
| +++ cat trace
| +++ grep -v '^#'
| +++ awk '{ print $5 }'
| +++ sort -u
| +++ wc -l            

I'll see if I can figure out what's actually blocking, and see if the
remaining tests work.

> > Signed-off-by: Mark Rutland <mark.rutland at arm.com>
> > Cc: Catalin Marinas <catalin.marinas at arm.com>
> > Cc: Madhavan T. Venkataraman <madvenka at linux.microsoft.com>
> > Cc: Mark Brown <broonie at kernel.org>
> > Cc: Will Deacon <will at kernel.org>
> 
> (Adding rostedt FYI)

I'll add Steve to the Cc list for v2.

> > ---
> >  arch/arm64/include/asm/ftrace.h     | 11 +++++++++++
> >  arch/arm64/include/asm/stacktrace.h |  6 ------
> >  arch/arm64/kernel/ftrace.c          |  6 +++---
> >  arch/arm64/kernel/stacktrace.c      | 12 +++++-------
> >  4 files changed, 19 insertions(+), 16 deletions(-)
> > 
> > I spotted this latent issue when reviewing Madhavan's stack tracing rework:
> > 
> >   https://lore.kernel.org/r/20211015025847.17694-1-madvenka@linux.microsoft.com
> > 
> > This supercedes the initial version I posted as an inline reply in:
> > 
> >   https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local
> 
> Please add a Link: tag referring to this.


Just to check, does that need to be a Link tag, or is it fine to do that
inline as:

| This supersedes an initial version posted as an inline reply in:
|
|   https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local  

... which is a bit clearer than saying which link to look at when there
are multiple.

... and should I drop the v1 link in there too?

> 
> > I believe we will need this as a prerequisite for other cleanups in that
> > series. As it fixes an existing bug I'm posting it on its own.
> > 
> > Mark.
> > 
> > diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> > index 91fa4baa1a93..c96d47cb8f46 100644
> > --- a/arch/arm64/include/asm/ftrace.h
> > +++ b/arch/arm64/include/asm/ftrace.h
> > @@ -12,6 +12,17 @@
> >  
> >  #define HAVE_FUNCTION_GRAPH_FP_TEST
> >  
> > +/*
> > + * HAVE_FUNCTION_GRAPH_RET_ADDR_PTR means that the architecture can provide a
> > + * "return address pointer" which can be used to uniquely identify a return
> > + * address which has been overwritten.
> > + *
> > + * On arm64 we use the address of the caller's frame record, which remains the
> > + * same for the lifetime of the instrumented function, unlike the return
> > + * address in the LR.
> > + */
> > +#define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
> > +
> >  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> >  #define ARCH_SUPPORTS_FTRACE_OPS 1
> >  #else
> > diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> > index 8aebc00c1718..9a319eca5cab 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -46,9 +46,6 @@ struct stack_info {
> >   * @prev_type:   The type of stack this frame record was on, or a synthetic
> >   *               value of STACK_TYPE_UNKNOWN. This is used to detect a
> >   *               transition from one stack to another.
> > - *
> > - * @graph:       When FUNCTION_GRAPH_TRACER is selected, holds the index of a
> > - *               replacement lr value in the ftrace graph stack.
> >   */
> >  struct stackframe {
> >  	unsigned long fp;
> > @@ -56,9 +53,6 @@ struct stackframe {
> >  	DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
> >  	unsigned long prev_fp;
> >  	enum stack_type prev_type;
> > -#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > -	int graph;
> > -#endif
> >  };
> >  
> >  extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
> > diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
> > index 7f467bd9db7a..3e5d0ed63eb7 100644
> > --- a/arch/arm64/kernel/ftrace.c
> > +++ b/arch/arm64/kernel/ftrace.c
> > @@ -249,8 +249,6 @@ int __init ftrace_dyn_arch_init(void)
> >   * on the way back to parent. For this purpose, this function is called
> >   * in _mcount() or ftrace_caller() to replace return address (*parent) on
> >   * the call stack to return_to_handler.
> > - *
> > - * Note that @frame_pointer is used only for sanity check later.
> >   */
> >  void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
> >  			   unsigned long frame_pointer)
> > @@ -268,8 +266,10 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
> >  	 */
> >  	old = *parent;
> >  
> > -	if (!function_graph_enter(old, self_addr, frame_pointer, NULL))
> > +	if (!function_graph_enter(old, self_addr, frame_pointer,
> > +	    (void *)frame_pointer)) {
> >  		*parent = return_hooker;
> > +	}
> >  }
> >  
> >  #ifdef CONFIG_DYNAMIC_FTRACE
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index 8982a2b78acf..749b680b4999 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -38,9 +38,6 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
> >  {
> >  	frame->fp = fp;
> >  	frame->pc = pc;
> > -#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > -	frame->graph = 0;
> > -#endif
> >  
> >  	/*
> >  	 * Prime the first unwind.
> > @@ -116,17 +113,18 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >  	if (tsk->ret_stack &&
> >  		(ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) {
> > -		struct ftrace_ret_stack *ret_stack;
> > +		unsigned long orig_pc;
> >  		/*
> >  		 * This is a case where function graph tracer has
> >  		 * modified a return address (LR) in a stack frame
> >  		 * to hook a function return.
> >  		 * So replace it to an original value.
> >  		 */
> > -		ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
> > -		if (WARN_ON_ONCE(!ret_stack))
> > +		orig_pc = ftrace_graph_ret_addr(tsk, NULL, frame->pc,
> > +						(void *)frame->fp);
> 
> Should we be using ptrauth_strip_insn_pac(frame->pc) again here?
> ftrace_graph_ret_addr() appears to repeat the comparison against
> return_to_handler.

Yes, that'll need the stripped pointer, and I had not tested with PAC.

Given we unconditionally strip the PAC immediately after the
CONFIG_FUNCTION_GRAPH_TRACER, it'd be simpler to pull that up, i.e. as a fixup:

| diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
| index 749b680b4999..13ea4e4a4d27 100644
| --- a/arch/arm64/kernel/stacktrace.c
| +++ b/arch/arm64/kernel/stacktrace.c
| @@ -110,9 +110,11 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
|         frame->prev_fp = fp;
|         frame->prev_type = info.type;
|  
| +       frame->pc = ptrauth_strip_insn_pac(frame->pc);
| +
|  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
|         if (tsk->ret_stack &&
| -               (ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) {
| +               (frame->pc == (unsigned long)return_to_handler)) {
|                 unsigned long orig_pc;
|                 /*
|                  * This is a case where function graph tracer has
| @@ -128,8 +130,6 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
|         }
|  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
|  
| -       frame->pc = ptrauth_strip_insn_pac(frame->pc);
| -
|         return 0;
|  }
|  NOKPROBE_SYMBOL(unwind_frame);

I'll spin a v2 with that and the commit message cleanups.

> > +		if (WARN_ON_ONCE(frame->pc == orig_pc))
> >  			return -EINVAL;
> > -		frame->pc = ret_stack->ret;
> > +		frame->pc = orig_pc;
> >  	}
> >  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> 
> This will conflict (trivially) with the kretprobe stuff in -next but might
> be worth routing via the tracing tree anyway.
> 
> Will



More information about the linux-arm-kernel mailing list