[PATCH] arm64: disable ARCH_CORRECT_STACKTRACE_ON_KRETPROBE tests

Mark Rutland mark.rutland at arm.com
Mon Nov 18 04:02:04 PST 2024


The kprobes_test suite's test_stacktrace_on_nested_kretprobe() test
currently fails on arm64, e.g.

| KTAP version 1
| 1..1
|     KTAP version 1
|     # Subtest: kprobes_test
|     # module: test_kprobes
|     1..7
|     ok 1 test_kprobe
|     ok 2 test_kprobes
|     ok 3 test_kprobe_missed
|     ok 4 test_kretprobe
|     ok 5 test_kretprobes
|     ok 6 test_stacktrace_on_kretprobe
|     # test_stacktrace_on_nested_kretprobe: EXPECTATION FAILED at lib/test_kprobes.c:327
|     Expected stack_buf[i + 1] == target_return_address[1], but
|         stack_buf[i + 1] == -96519936577004 (0xffffa83733777214)
|         target_return_address[1] == -96519936577136 (0xffffa83733777190)
|     # test_stacktrace_on_nested_kretprobe: EXPECTATION FAILED at lib/test_kprobes.c:338
|     Expected stack_buf[1] == target_return_address[1], but
|         stack_buf[1] == -96519936577004 (0xffffa83733777214)
|         target_return_address[1] == -96519936577136 (0xffffa83733777190)
|     not ok 7 test_stacktrace_on_nested_kretprobe
| # kprobes_test: pass:6 fail:1 skip:0 total:7
| # Totals: pass:6 fail:1 skip:0 total:7
| not ok 1 kprobes_test

The test assumes that when a stacktrace straddles an exception boundary,
no necessary entries will be omitted and no extraneous entries will be
reported, and when unwinding from a kretprobed callee, the next entry in
the trace will be its immediate caller (whether kretprobed or not).

Recently the arm64 stacktrace code was changed to always report the LR
at an exception boundary, where we don't know whether the LR is live.
In the case of the kretprobe trampoline the LR is not live at the time
the stacktrace is performed, and so the entry in the trace for the LR is
extraneous. This can be seen if a call to show_stack() is added to
stacktrace_internal_return_handler():

| Call trace:
|  show_stack+0x18/0x30 (C)
|  stacktrace_internal_return_handler+0x130/0x43c
|  __kretprobe_trampoline_handler+0xa0/0x130
|  kretprobe_breakpoint_handler+0x50/0x70
|  call_break_hook+0x74/0x8c
|  brk_handler+0x1c/0x60
|  do_debug_exception+0x68/0x114
|  el1_dbg+0x70/0x94
|  el1h_64_sync_handler+0xc4/0xe4
|  el1h_64_sync+0x6c/0x70
|  kprobe_stacktrace_target+0x34/0x48 (P)
|  kprobe_stacktrace_target+0x34/0x48 (LK) <-------- extra entry here
|  kprobe_stacktrace_driver+0x24/0x40 (K)
|  test_stacktrace_on_nested_kretprobe+0x84/0x160
|  kunit_try_run_case+0x6c/0x160
|  kunit_generic_run_threadfn_adapter+0x28/0x4c
|  kthread+0x110/0x114
|  ret_from_fork+0x10/0x20

This breaks test_stacktrace_on_nested_kretprobe() because while the
caller (kprobe_stacktrace_driver()) appears in the trace, it doesn't
occur *immediately* after the first instance of callee
(kprobe_stacktrace_target()).

While this behaviour is unfortunate for the kretprobes tests, the
behaviour is desirable elsewhere (e.g. anywhere a human will read the
trace), and is otherwise not harmful.

For the moment, deselect ARCH_CORRECT_STACKTRACE_ON_KRETPROBE on arm64
to disable the tests which depend on this behaviour. With
ARCH_CORRECT_STACKTRACE_ON_KRETPROBE deselected, the remaining tests
work as expected, e.g.

| KTAP version 1
| 1..1
|     KTAP version 1
|     # Subtest: kprobes_test
|     # module: test_kprobes
|     1..5
|     ok 1 test_kprobe
|     ok 2 test_kprobes
|     ok 3 test_kprobe_missed
|     ok 4 test_kretprobe
|     ok 5 test_kretprobes
| # kprobes_test: pass:5 fail:0 skip:0 total:5
| # Totals: pass:5 fail:0 skip:0 total:5
| ok 1 kprobes_test

In future we have several options to improve matters, e.g.

* Add metadata and update arm64's unwinder to skip the LR in this case.
  This is likely to happen as part of work for RELIABLE_STACKTRACE for
  other reasons, and might solve this case by coincidence.

* Modify the kretprobes tests to only require that the caller appears in
  the trace after the callee, rather than requiring that it is
  *immediately* after the callee. We might want separate
  strict/not-strict options for this.

* Use reliable stacktrace for these tests, so that architectures which
  cannot unwind across exception boundaries can explicitly handle this
  by returning an error.

Fixes: c2c6b27b5aa1 ("arm64: stacktrace: unwind exception boundaries")
Signed-off-by: Mark Rutland <mark.rutland at arm.com>
Reported-by: Kristina Martsenko <kristina.martsenko at arm.com>
Cc: Ard Biesheuvel <ardb at kernel.org>
Cc: Catalin Marinas <catalin.marinas at arm.com>
Cc: Josh Poimboeuf <jpoimboe at kernel.org>
Cc: Kalesh Singh <kaleshsingh at google.com>
Cc: Madhavan T. Venkataraman <madvenka at linux.microsoft.com>
Cc: Marc Zyngier <maz at kernel.org>
Cc: Mark Brown <broonie at kernel.org>
Cc: Masami Hiramatsu <mhiramat at kernel.org>
Cc: Miroslav Benes <mbenes at suse.cz>
Cc: Puranjay Mohan <puranjay12 at gmail.com>
Cc: Steven Rostedt <rostedt at goodmis.org>
Cc: Will Deacon <will at kernel.org>
---
 arch/arm64/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3e29b44d2d7bd..234d926290731 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -14,7 +14,6 @@ config ARM64
 	select ARCH_HAS_DEBUG_WX
 	select ARCH_BINFMT_ELF_EXTRA_PHDRS
 	select ARCH_BINFMT_ELF_STATE
-	select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
 	select ARCH_ENABLE_HUGEPAGE_MIGRATION if HUGETLB_PAGE && MIGRATION
 	select ARCH_ENABLE_MEMORY_HOTPLUG
 	select ARCH_ENABLE_MEMORY_HOTREMOVE
-- 
2.30.2




More information about the linux-arm-kernel mailing list