[PATCH] riscv: stacktrace: fixed walk_stackframe()

Matthew Bystrin dev.mbstr at gmail.com
Thu May 9 15:13:41 PDT 2024


Thanks for taking your time to review the patch!

On 2024-05-08 06:51 PM, Samuel Holland wrote:
> A better check would be based on frame->ra, which is always pushed. For
> non-leaf functions, frame->ra is a kernel text address. For leaf functions,
> frame->ra is a stack address. So checking if frame->ra is within the current
> stack (between `low` and `high`) should be sufficient to detect this case.

I've came up with slightly different approach. Checking that frame->ra is not a
kernel text address seems like a more compact solution. Palmer, Samuel, what do
you think?

diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
index 64a9c093aef9..e63bb926d3d9 100644
--- a/arch/riscv/kernel/stacktrace.c
+++ b/arch/riscv/kernel/stacktrace.c
@@ -55,7 +55,8 @@ void notrace walk_stackframe(struct task_struct *task, struct pt_regs *regs,
 		/* Unwind stack frame */
 		frame = (struct stackframe *)fp - 1;
 		sp = fp;
-		if (regs && (regs->epc == pc) && (frame->fp & 0x7)) {
+		if (regs && (regs->epc == pc) && !__kernel_text_address(frame->ra)) {
+			/* We hit function where ra is not saved on the stack */
 			fp = frame->ra;
 			pc = regs->ra;
 		} else {

On 2024-05-08 08:05 PM, Palmer Dabbelt wrote:
> I was just playing with this, looks like GCC is treating tail-call-only
> functions as leaf functions.  So I think whatever workaround we come up
> with loses the tail-call-only function in the backtrace, not sure if
> there's any way we can work around that as there's essentially no
> frame remaining for it.

Yes, we definitely will miss some functions. But at least we can observe
underlying call stack in older compiler versions.

On 2024-05-08 08:05 PM, Palmer Dabbelt wrote:
> > For the kernel, since we want frame records to be available for stack
> > walking, we should pass -mno-omit-leaf-frame-pointer if that option is
> > supported. If the option is supported, we know the ABI is fixed, so in that
> > case we can omit the workaround described above.
> >
> > We may still want a compiler version check, because there is a much simpler
> > GCC change that would fix the ABI incompatibility in older branches without
> > adding the new option:
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index dddd72aa237..58722507dcb 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -3894,7 +3894,8 @@ riscv_save_reg_p (unsigned int regno)
> >    if (regno == HARD_FRAME_POINTER_REGNUM && frame_pointer_needed)
> >      return true;
> >
> > -  if (regno == RETURN_ADDR_REGNUM && crtl->calls_eh_return)
> > +  if (regno == RETURN_ADDR_REGNUM && (crtl->calls_eh_return
> > +                                      || frame_pointer_needed))
> >      return true;
> >
> >    /* If this is an interrupt handler, then must save extra registers.  */
>
> IMO we should try backporting the option to GCC-13, that's a cleaner
> user interface.

Agreed. And what about older compiler versions? Is that a good idea to move
unwinding logic into the separate function which can have different
implementations (via conditional compilation) depending on the compiler version?

Now speaking about the implementation. I'm not sure where
`-mno-omit-leaf-frame-pointer` flag should be added. Do you have any suggestions
for a proper way to do it or maybe related examples? Also I can discuss it with
Masahiro.

-- 
Best regards,
Matthew Bystrin <dev.mbstr at gmail.com>



More information about the linux-riscv mailing list