[PATCH] riscv: stacktrace: fixed walk_stackframe()

Palmer Dabbelt palmer at dabbelt.com
Wed May 8 10:05:45 PDT 2024


On Wed, 08 May 2024 08:51:04 PDT (-0700), samuel.holland at sifive.com wrote:
> Hi Matthew,
>
> Thanks for the patch!
>
> On 2024-04-26 2:24 AM, Matthew Bystrin wrote:
>> If the load access fault occures in a function without callee
>> (CONFIG_FRAME_POINTER=y), when wrong stack trace will be displayed:
>>
>> [<ffffffff804853c2>] regmap_mmio_read32le+0xe/0x1c
>> ---[ end trace 0000000000000000 ]---
>>
>> Registers dump:
>>     ra     0xffffffff80485758 <regmap_mmio_read+36>
>>     sp     0xffffffc80200b9a0
>>     fp     0xffffffc80200b9b0
>>     pc     0xffffffff804853ba <regmap_mmio_read32le+6>
>>
>> Stack dump:
>>     0xffffffc80200b9a0:  0xffffffc80200b9e0  0xffffffc80200b9e0
>>     0xffffffc80200b9b0:  0xffffffff8116d7e8  0x0000000000000100
>>     0xffffffc80200b9c0:  0xffffffd8055b9400  0xffffffd8055b9400
>>     0xffffffc80200b9d0:  0xffffffc80200b9f0  0xffffffff8047c526
>>     0xffffffc80200b9e0:  0xffffffc80200ba30  0xffffffff8047fe9a
>>
>> The assembler dump of the function preambula:
>>     add     sp,sp,-16
>>     sd      s0,8(sp)
>>     add     s0,sp,16
>>
>> In the fist stack frame, where ra is not stored on the stack we can
>> observe:
>>
>>         0(sp)                  8(sp)
>>         .---------------------------------------------.
>>     sp->|       frame->fp      | frame->ra (saved fp) |
>>         |---------------------------------------------|
>>     fp->|         ....         |         ....         |
>>         |---------------------------------------------|
>>         |                      |                      |
>>
>> and in the code check is performed:
>> 	if (regs && (regs->epc == pc) && (frame->fp & 0x7))
>>
>> I see no reason to check frame->fp value at all, because it is can be
>> uninitialized value on the stack, so removed it. After the stacktrace
>
> The purpose of this check is to distinguish leaf functions from non-leaf
> functions, which have a different frame record layout. regs->epc does not
> necessarily point to somewhere in a leaf function, for example if a non-leaf
> function was preempted by an interrupt. So it incorrect to remove the check
> entirely and always consider the first frame in the stack trace to be a leaf
> function.
>
> You are correct that for a leaf function, frame->fp could be uninitialized or an
> arbitrary saved register, so the existing check is wrong. A better check would
> be based on frame->ra, which is always pushed. For non-leaf functions, frame->ra
> is a kernel text address. For leaf functions, frame->ra is a stack address. So
> checking if frame->ra is within the current stack (between `low` and `high`)
> should be sufficient to detect this case.

I was just playing with this, looks like GCC is treating tail-call-only 
functions as leaf functions.  So I think whatever workaround we come up 
with loses the tail-call-only function in the backtrace, not sure if 
there's any way we can work around that as there's essentially no 
frame remaining for it.  From https://godbolt.org/z/7bMsEbqjn and 
GCC trunk

    int glob;
    
    void leaf(int x) {
        glob = x;
    }
    
    void tail(int x) {
        leaf(x);
    }
    
    void func(int x) {
        leaf(x);
        glob = x+1;
    }
    
    -O2 -fno-inline -fno-omit-frame-pointer -momit-leaf-frame-pointer
    
    leaf(int):
            lui     a5,%hi(glob)
            sw      a0,%lo(glob)(a5)
            ret
    tail(int):
            tail    leaf(int)
    func(int):
            addi    sp,sp,-32
            sd      s0,16(sp)
            sd      s1,8(sp)
    …        addi    sp,sp,32
            jr      ra
    glob:
            .zero   4

>
>> shows as expect:
>>
>> [<ffffffff804853c2>] regmap_mmio_read32le+0xe/0x1c
>> [<ffffffff80485758>] regmap_mmio_read+0x24/0x52
>> [<ffffffff8047c526>] _regmap_bus_reg_read+0x1a/0x22
>> [<ffffffff8047fe9a>] _regmap_read+0x5c/0xea
>> [<ffffffff80480376>] _regmap_update_bits+0x76/0xc0
>> ...
>> ---[ end trace 0000000000000000 ]---
>>
>> Fixes: f766f77a74f5 ("riscv/stacktrace: Fix stack output without ra on the stack top")
>> Signed-off-by: Matthew Bystrin <dev.mbstr at gmail.com>
>> ---
>> I've catched this bug on v6.1 with gcc 12.2.0 (Debian 12.2.0-13). Different
>> compiler versions generate the same assembler code. So I think this is not a
>> compiler issue.
>
> Unfortunately, this is indeed a mismatch between the compiler and the documented
> ABI[1]. The ABI was created based on the existing GCC behavior[2]. However, the
> GCC behavior is different for leaf functions. Specifically, the logic for
> determining if ra needs to be saved on the stack does not check if frame
> pointers are enabled. Since GCC was not trying to maintain a specific stack
> frame layout, and just pushing registers sequentially as needed, this makes the
> frame record layout different for leaf functions. So existing versions of GCC
> are incompatible with the ABI as written.
>
> This was fixed for GCC 14 as a side effect of adding the
> -m(no)-omit-leaf-frame-pointer option[3]. After this change, ra is always saved
> if a frame record is being saved, so the frame record layout is always the same,
> matching the ABI.

Ya, I think so -- IIUC we don't strictly need the option because the 
default seems to follow -fno-omit-frame-pointer, so we only strictly need 
to check for the presence of -mno-omit-leaf-frame-pointer.  Probably 
cleaner to just stick the option in there when it's supported, though, 
just to be on the safe side.

> For the kernel, since we want frame records to be available for stack walking,
> we should pass -mno-omit-leaf-frame-pointer if that option is supported. If the
> option is supported, we know the ABI is fixed, so in that case we can omit the
> workaround described above.
>
> We may still want a compiler version check, because there is a much simpler GCC
> change that would fix the ABI incompatibility in older branches without adding
> the new option:
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index dddd72aa237..58722507dcb 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3894,7 +3894,8 @@ riscv_save_reg_p (unsigned int regno)
>    if (regno == HARD_FRAME_POINTER_REGNUM && frame_pointer_needed)
>      return true;
>
> -  if (regno == RETURN_ADDR_REGNUM && crtl->calls_eh_return)
> +  if (regno == RETURN_ADDR_REGNUM && (crtl->calls_eh_return
> +				      || frame_pointer_needed))
>      return true;
>
>    /* If this is an interrupt handler, then must save extra registers.  */

IMO we should try backporting the option to GCC-13, that's a cleaner 
user interface.

> [1]:
> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#frame-pointer-convention
> [2]: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/18
> [3]:
> https://github.com/gcc-mirror/gcc/commit/39663298b5934831a0125e12f113ebd83248c3be
>
> Regards,
> Samuel
>
>>
>>  arch/riscv/kernel/stacktrace.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
>> index 64a9c093aef9..8138c68270c9 100644
>> --- a/arch/riscv/kernel/stacktrace.c
>> +++ b/arch/riscv/kernel/stacktrace.c
>> @@ -55,7 +55,7 @@ void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
>>  		/* Unwind stack frame */
>>  		frame = (struct stackframe *)fp - 1;
>>  		sp = fp;
>> -		if (regs && (regs->epc == pc) && (frame->fp & 0x7)) {
>> +		if (regs && regs->epc == pc) {
>>  			fp = frame->ra;
>>  			pc = regs->ra;
>>  		} else {
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv



More information about the linux-riscv mailing list