[PATCH v6 0/8] perf cs-etm: Support thread stack and callchain

Arnaldo Carvalho de Melo acme at kernel.org
Fri May 29 07:57:14 PDT 2026


On Tue, May 26, 2026 at 05:59:36PM +0100, Leo Yan wrote:
> This series adds thread-stack and synthesized callchain support for Arm
> CoreSight, which comes from older series [1] but heavily rewritten.

Hi Leo,

	Please add what changed from v5, v4, etc.

- Arnaldo
 
> CS ETM previously kept last-branch state in a per-trace-queue buffer.
> That effectively makes the state per CPU, while the call/return history
> belongs to a thread. This series moves branch tracking to the common
> thread-stack code.
> 
> The series records CoreSight branches with thread_stack__event(), uses
> thread_stack__br_sample() for last branch entries, flushes thread stacks
> after decoder resets.
> 
> A decoder reset between AUX trace buffers is treated as a global trace
> discontinuity, so all thread stacks are flushed, so avoids carrying
> stale call/return history across a trace discontinuity.
> 
> One limitation remains for instructions emulated by the kernel. In that
> case the exception return address may not match the return address
> stored in the thread stack, because after exception return can be one
> instruction ahead. The stack can still recover when a later return
> matches an upper caller. Given emulated instructions are not the common
> target for performance callchain analysis. Supporting this would require
> extending the common thread-stack path to accept both the real target
> address and an adjusted address for stack matching, so this series
> leaves that extra complexity out.
> 
> The series has been tested on Orion6 board:
> 
>   perf test 150 -vvv
> 
>   150: Check Arm CoreSight synthesized callchain:
>   --- start ---
>   test child forked, pid 13528
>   Test callchain push: PASS
>   Test callchain pop: PASS
>   ---- end(0) ----
>   150: Check Arm CoreSight synthesized callchain                       : Ok
> 
>   perf script --itrace=g16i10il64
> 
>   callchain_test   17468 [005] 1031003.229943:         10 instructions:
>               aaaac32507c4 main+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
>               ffff90bd233c call_init+0x9c (inlined)
>               ffff90bd233c __libc_start_main_impl+0x9c (inlined)
>               aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
> 
>   callchain_test   17468 [005] 1031003.229943:         10 instructions:
>               aaaac3250774 do_svc+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac3250798 print+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac32507b0 foo+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac32507c8 main+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
>               ffff90bd233c call_init+0x9c (inlined)
>               ffff90bd233c __libc_start_main_impl+0x9c (inlined)
>               aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
> 
>   callchain_test   17468 [005] 1031003.229944:         10 instructions:
>           ffff800080010c20 vectors+0x420 ([kernel.kallsyms])
>               aaaac3250784 do_svc+0x1c (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac3250798 print+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac32507b0 foo+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac32507c8 main+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
>               ffff90bd233c call_init+0x9c (inlined)
>               ffff90bd233c __libc_start_main_impl+0x9c (inlined)
>               aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
> 
> Note, the test fails on Juno board which is caused by many discontinuity
> packets (mainly caused by NO_SYNC elem). This is likely caused by the
> FIFO overflow on the path.
> 
> [1] https://lore.kernel.org/linux-arm-kernel/20200220052701.7754-1-leo.yan@linaro.org/
> 
> Signed-off-by: Leo Yan <leo.yan at arm.com>
> ---
> Leo Yan (8):
>       perf cs-etm: Decode ETE exception packets
>       perf cs-etm: Refactor instruction size handling
>       perf cs-etm: Use thread-stack for last branch entries
>       perf cs-etm: Flush thread stacks after decoder reset
>       perf cs-etm: Support call indentation
>       perf cs-etm: Filter synthesized branch samples
>       perf cs-etm: Synthesize callchains for instruction samples
>       perf test: Add Arm CoreSight callchain test
> 
>  .../tests/shell/test_arm_coresight_callchain.sh    | 235 ++++++++++++++++
>  tools/perf/util/cs-etm.c                           | 309 ++++++++++++---------
>  2 files changed, 408 insertions(+), 136 deletions(-)
> ---
> base-commit: bd2a5be1fe731bc7548205dd148db75f1d588da2
> change-id: 20260521-b4-arm_cs_callchain_support_v1-2c2a70719bcc
> 
> Best regards,
> -- 
> Leo Yan <leo.yan at arm.com>
> 



More information about the linux-arm-kernel mailing list