[PATCH v3 1/2] arm64: Implement arch_stack_walk_reliable

Mark Rutland mark.rutland at arm.com
Mon May 19 06:41:06 PDT 2025


On Thu, Mar 20, 2025 at 10:15:58AM -0700, Song Liu wrote:
> With proper exception boundary detection, it is possible to implment
> arch_stack_walk_reliable without sframe.
> 
> Note that, arch_stack_walk_reliable does not guarantee getting reliable
> stack in all scenarios. Instead, it can reliably detect when the stack
> trace is not reliable, which is enough to provide reliable livepatching.
> 
> Signed-off-by: Song Liu <song at kernel.org>
> ---
>  arch/arm64/Kconfig             |  2 +-
>  arch/arm64/kernel/stacktrace.c | 66 +++++++++++++++++++++++++---------
>  2 files changed, 51 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 701d980ea921..31d5e1ee6089 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -276,6 +276,7 @@ config ARM64
>  	select HAVE_SOFTIRQ_ON_OWN_STACK
>  	select USER_STACKTRACE_SUPPORT
>  	select VDSO_GETRANDOM
> +	select HAVE_RELIABLE_STACKTRACE
>  	help
>  	  ARM 64-bit (AArch64) Linux support.
>  
> @@ -2500,4 +2501,3 @@ endmenu # "CPU Power Management"
>  source "drivers/acpi/Kconfig"
>  
>  source "arch/arm64/kvm/Kconfig"
> -
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 1d9d51d7627f..7e07911d8694 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -56,6 +56,7 @@ struct kunwind_state {
>  	enum kunwind_source source;
>  	union unwind_flags flags;
>  	struct pt_regs *regs;
> +	bool end_on_unreliable;
>  };
>  
>  static __always_inline void
> @@ -230,8 +231,26 @@ kunwind_next_frame_record(struct kunwind_state *state)
>  	new_fp = READ_ONCE(record->fp);
>  	new_pc = READ_ONCE(record->lr);
>  
> -	if (!new_fp && !new_pc)
> -		return kunwind_next_frame_record_meta(state);
> +	if (!new_fp && !new_pc) {
> +		int ret;
> +
> +		ret = kunwind_next_frame_record_meta(state);
> +		if (ret < 0) {
> +			/*
> +			 * This covers two different conditions:
> +			 *  1. ret == -ENOENT, unwinding is done.
> +			 *  2. ret == -EINVAL, unwinding hit error.
> +			 */
> +			return ret;
> +		}
> +		/*
> +		 * Searching across exception boundaries. The stack is now
> +		 * unreliable.
> +		 */
> +		if (state->end_on_unreliable)
> +			return -EINVAL;
> +		return 0;
> +	}

My original expectation for this this was that we'd propogate the
errors, and then all the reliability logic would live un a consume_entry
wrapper like we have for BPF, e.g.

| static __always_inline bool
| arch_reliable_kunwind_consume_entry(const struct kunwind_state *state, void *cookie)
| {
|         struct kunwind_consume_entry_data *data = cookie;
| 
|         /*  
|          * When unwinding across an exception boundary, the PC will be
|          * reliable, but we do not know whether the FP is live, and so we
|          * cannot perform the *next* unwind reliably.
|          *
|          * Give up as soon as we hit an exception boundary.
|          */
|         if (state->source == KUNWIND_SOURCE_REGS_PC)
|                 return false;
| 
|         return data->consume_entry(data->cookie, state->common.pc);
| }
| 
| noinline noinstr int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
|                                               void *cookie,
|                                               struct task_struct *task)
| {
|         int ret;
|         struct kunwind_consume_entry_data data = { 
|                 .consume_entry = consume_entry,
|                 .cookie = cookie,
|         };  
| 
|         ret = kunwind_stack_walk(arch_reliable_kunwind_consume_entry, &data,
|                                  task, NULL);
|         return ret == -ENOENT ? 0 : ret;
| }

... and then in future we can add anything spdecific to reliable
stacktrace there.

That aside, this generally looks good to me. The only thing that I note
is that we're lacking a check on the return value of
kretprobe_find_ret_addr(), and we should return -EINVAL when that is
NULL, but that should never happen in normal operation.

I've pushed a arm64/stacktrace-updates branch [1] with fixups for those
as two separate commits atop this one. If that looks good to you I
suggest we post that as a series and ask Will and Catalin to take that
as-is.

I'll look at the actual patching bits now.

Mark.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/ arm64/stacktrace-updates



More information about the linux-arm-kernel mailing list