[PATCH] ARM: perf: Use correct unwind info for tracepoint events

Tomislav Novak tnovak at fb.com
Tue Sep 20 17:24:47 PDT 2022


The address provided to perf_arch_fetch_caller_regs is CALLER_ADDR0, which
is inconsistent with the rest of the state captured by the macro (different
stack frames). If CONFIG_ARM_UNWIND=y, unwind_frame() uses this address as
the index into unwind tables, so it attempts to unwind one frame with its
caller's unwind info.

With 8d54a27593 ("ARM: allow unwinder to unwind recursive functions") this
no longer causes unwinding to be broken for tracepoint events (previously
unwind_frame() would detect that PC == LR and bail out), but the caller
does get printed twice:

  # bpftrace -e 't:sched:sched_switch { @[kstack] = count(); exit(); }'
  Attaching 1 probe...
  @[
      __schedule+1468
      __schedule+1468
      schedule+132
      schedule_hrtimeout_range_clock+248
      [...]
  ]: 1

Unwinding tends to work okay in practice because perf_trace_* functions and
their callers usually have similar prologues. A common unwind insn sequence
is "sp=fp, sp-=X, pop_r4_to_rN", so even if prologues don't match exactly,
stored LR gets correctly popped off the stack.

Rather than relying on this, make sure that the ARM_pc and ARM_fp values
passed to the ARM unwinder belong to the same stack frame by using the
current PC in perf_arch_fetch_caller_regs() instead of the return address.
Note that perf_trace_* functions also get included as a result.

  # bpftrace -e 't:sched:sched_switch { @[kstack] = count(); exit(); }'
  Attaching 1 probe...
  @[
      perf_trace_sched_switch+132
      __schedule+1468
      schedule+132
      schedule_hrtimeout_range_clock+248
      [...]
  ]: 1

Signed-off-by: Tomislav Novak <tnovak at fb.com>
---

With the portable _THIS_IP_ macro, perf_fetch_caller_regs was not getting
inlined in my builds (address-of-label compiled into a pc-relative load of
the fixed address from the literal pool) and was therefore included in
every stack trace; using inline asm avoids this.

perf_trace_* functions could be skipped by placing them in a separate
section (similar to how in_sched_functions() works), but those are less
noisy so that might not be necessary.

 arch/arm/include/asm/perf_event.h | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/perf_event.h b/arch/arm/include/asm/perf_event.h
index 4f9dec489931..a1571a74e75b 100644
--- a/arch/arm/include/asm/perf_event.h
+++ b/arch/arm/include/asm/perf_event.h
@@ -19,8 +19,21 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)	perf_misc_flags(regs)
 #endif
 
+#ifdef CONFIG_ARM_UNWIND
+/*
+ * unwind_frame() uses ARM_pc as the unwind table index, so we need to
+ * make sure it points to a location within the current stack frame to
+ * be consistent with the rest of the state (namely, ARM_fp and ARM_sp).
+ */
+#define __fetch_pc(regs, __ip) { \
+	asm volatile ("mov %0, pc" : "=r" ((regs)->ARM_pc)); \
+}
+#else
+#define __fetch_pc(regs, __ip) { (regs)->ARM_pc = (__ip); }
+#endif
+
 #define perf_arch_fetch_caller_regs(regs, __ip) { \
-	(regs)->ARM_pc = (__ip); \
+	__fetch_pc((regs), (__ip)); \
 	(regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
 	(regs)->ARM_sp = current_stack_pointer; \
 	(regs)->ARM_cpsr = SVC_MODE; \
-- 
2.30.2




More information about the linux-arm-kernel mailing list