[PATCH] riscv: stacktrace: fixed walk_stackframe()
Samuel Holland
samuel.holland at sifive.com
Wed May 8 08:51:04 PDT 2024
Hi Matthew,
Thanks for the patch!
On 2024-04-26 2:24 AM, Matthew Bystrin wrote:
> If the load access fault occures in a function without callee
> (CONFIG_FRAME_POINTER=y), when wrong stack trace will be displayed:
>
> [<ffffffff804853c2>] regmap_mmio_read32le+0xe/0x1c
> ---[ end trace 0000000000000000 ]---
>
> Registers dump:
> ra 0xffffffff80485758 <regmap_mmio_read+36>
> sp 0xffffffc80200b9a0
> fp 0xffffffc80200b9b0
> pc 0xffffffff804853ba <regmap_mmio_read32le+6>
>
> Stack dump:
> 0xffffffc80200b9a0: 0xffffffc80200b9e0 0xffffffc80200b9e0
> 0xffffffc80200b9b0: 0xffffffff8116d7e8 0x0000000000000100
> 0xffffffc80200b9c0: 0xffffffd8055b9400 0xffffffd8055b9400
> 0xffffffc80200b9d0: 0xffffffc80200b9f0 0xffffffff8047c526
> 0xffffffc80200b9e0: 0xffffffc80200ba30 0xffffffff8047fe9a
>
> The assembler dump of the function preambula:
> add sp,sp,-16
> sd s0,8(sp)
> add s0,sp,16
>
> In the fist stack frame, where ra is not stored on the stack we can
> observe:
>
> 0(sp) 8(sp)
> .---------------------------------------------.
> sp->| frame->fp | frame->ra (saved fp) |
> |---------------------------------------------|
> fp->| .... | .... |
> |---------------------------------------------|
> | | |
>
> and in the code check is performed:
> if (regs && (regs->epc == pc) && (frame->fp & 0x7))
>
> I see no reason to check frame->fp value at all, because it is can be
> uninitialized value on the stack, so removed it. After the stacktrace
The purpose of this check is to distinguish leaf functions from non-leaf
functions, which have a different frame record layout. regs->epc does not
necessarily point to somewhere in a leaf function, for example if a non-leaf
function was preempted by an interrupt. So it incorrect to remove the check
entirely and always consider the first frame in the stack trace to be a leaf
function.
You are correct that for a leaf function, frame->fp could be uninitialized or an
arbitrary saved register, so the existing check is wrong. A better check would
be based on frame->ra, which is always pushed. For non-leaf functions, frame->ra
is a kernel text address. For leaf functions, frame->ra is a stack address. So
checking if frame->ra is within the current stack (between `low` and `high`)
should be sufficient to detect this case.
> shows as expect:
>
> [<ffffffff804853c2>] regmap_mmio_read32le+0xe/0x1c
> [<ffffffff80485758>] regmap_mmio_read+0x24/0x52
> [<ffffffff8047c526>] _regmap_bus_reg_read+0x1a/0x22
> [<ffffffff8047fe9a>] _regmap_read+0x5c/0xea
> [<ffffffff80480376>] _regmap_update_bits+0x76/0xc0
> ...
> ---[ end trace 0000000000000000 ]---
>
> Fixes: f766f77a74f5 ("riscv/stacktrace: Fix stack output without ra on the stack top")
> Signed-off-by: Matthew Bystrin <dev.mbstr at gmail.com>
> ---
> I've catched this bug on v6.1 with gcc 12.2.0 (Debian 12.2.0-13). Different
> compiler versions generate the same assembler code. So I think this is not a
> compiler issue.
Unfortunately, this is indeed a mismatch between the compiler and the documented
ABI[1]. The ABI was created based on the existing GCC behavior[2]. However, the
GCC behavior is different for leaf functions. Specifically, the logic for
determining if ra needs to be saved on the stack does not check if frame
pointers are enabled. Since GCC was not trying to maintain a specific stack
frame layout, and just pushing registers sequentially as needed, this makes the
frame record layout different for leaf functions. So existing versions of GCC
are incompatible with the ABI as written.
This was fixed for GCC 14 as a side effect of adding the
-m(no)-omit-leaf-frame-pointer option[3]. After this change, ra is always saved
if a frame record is being saved, so the frame record layout is always the same,
matching the ABI.
For the kernel, since we want frame records to be available for stack walking,
we should pass -mno-omit-leaf-frame-pointer if that option is supported. If the
option is supported, we know the ABI is fixed, so in that case we can omit the
workaround described above.
We may still want a compiler version check, because there is a much simpler GCC
change that would fix the ABI incompatibility in older branches without adding
the new option:
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dddd72aa237..58722507dcb 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3894,7 +3894,8 @@ riscv_save_reg_p (unsigned int regno)
if (regno == HARD_FRAME_POINTER_REGNUM && frame_pointer_needed)
return true;
- if (regno == RETURN_ADDR_REGNUM && crtl->calls_eh_return)
+ if (regno == RETURN_ADDR_REGNUM && (crtl->calls_eh_return
+ || frame_pointer_needed))
return true;
/* If this is an interrupt handler, then must save extra registers. */
[1]:
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#frame-pointer-convention
[2]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/18
[3]:
https://github.com/gcc-mirror/gcc/commit/39663298b5934831a0125e12f113ebd83248c3be
Regards,
Samuel
>
> arch/riscv/kernel/stacktrace.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
> index 64a9c093aef9..8138c68270c9 100644
> --- a/arch/riscv/kernel/stacktrace.c
> +++ b/arch/riscv/kernel/stacktrace.c
> @@ -55,7 +55,7 @@ void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
> /* Unwind stack frame */
> frame = (struct stackframe *)fp - 1;
> sp = fp;
> - if (regs && (regs->epc == pc) && (frame->fp & 0x7)) {
> + if (regs && regs->epc == pc) {
> fp = frame->ra;
> pc = regs->ra;
> } else {
More information about the linux-riscv
mailing list