boot flooded with unwind: Index not found

Russell King (Oracle) linux at armlinux.org.uk
Tue Mar 8 16:01:26 PST 2022


On Wed, Mar 02, 2022 at 11:22:29AM +0000, Russell King (Oracle) wrote:
> On Wed, Mar 02, 2022 at 12:19:40PM +0100, Ard Biesheuvel wrote:
> > On Wed, 2 Mar 2022 at 12:12, Russell King (Oracle)
> > <linux at armlinux.org.uk> wrote:
> > >
> > > On Wed, Mar 02, 2022 at 11:09:49AM +0100, Corentin Labbe wrote:
> > > > The crash disappeared (but the suspicious RCU usage is still here).
> > >
> > > As the trace on those is:
> > >
> > > [    0.239629]  unwind_backtrace from show_stack+0x10/0x14
> > > [    0.239654]  show_stack from init_stack+0x1c54/0x2000
> > >
> > > unwind_backtrace() and show_stack() are both C code, the compiler will
> > > emit the unwind information for it. show_stack() isn't called from
> > > assembly code, only from C code, so the next function's unwind
> > > information should also be generated by the compiler.
> > >
> > > However, init_stack is not a function - it's an array of unsigned long.
> > > There is no way this should appear in the trace, and this suggests that
> > > the unwind of show_stack() has gone wrong.
> > >
> > > I don't see anything obvious in Ard's changes that would cause that
> > > though.
> > >
> > > Did it used to work fine with previous versions of linux-next - those
> > > versions where we had Ard's "arm-vmap-stacks-v6" tag merged in
> > > (commit 2fa394824493) and did this only appear when I merged
> > > "arm-ftrace-for-rmk" (commit 74aaaa1e9bba) ? Did merging
> > > "arm-ftrace-for-rmk" cause any change in your .config?
> > >
> > 
> > I can reproduce the RCU warnings, and I have tracked this down to the
> > change I made to return_address() for the graph tracer, which I
> > thought was justified after removing the call to
> > kernel_text_address():
> > 
> > --- a/arch/arm/include/asm/ftrace.h
> > +++ b/arch/arm/include/asm/ftrace.h
> > @@ -35,26 +35,8 @@ static inline unsigned long
> > ftrace_call_adjust(unsigned long addr)
> > 
> >  #ifndef __ASSEMBLY__
> > 
> > -#if defined(CONFIG_FRAME_POINTER) && !defined(CONFIG_ARM_UNWIND)
> > -/*
> > - * return_address uses walk_stackframe to do it's work.  If both
> > - * CONFIG_FRAME_POINTER=y and CONFIG_ARM_UNWIND=y walk_stackframe uses unwind
> > - * information.  For this to work in the function tracer many functions would
> > - * have to be marked with __notrace.  So for now just depend on
> > - * !CONFIG_ARM_UNWIND.
> > - */
> > -
> >  void *return_address(unsigned int);
> > 
> > -#else
> > -
> > -static inline void *return_address(unsigned int level)
> > -{
> > -       return NULL;
> > -}
> > -
> > -#endif
> > -
> >  #define ftrace_return_address(n) return_address(n)
> > 
> >  #define ARCH_HAS_SYSCALL_MATCH_SYM_NAME
> > 
> > However, the function graph tracer works happily with this bit
> > reverted, and so that is probably the best course of action here.
> > 
> > I have already sent the patch that reintroduces the
> > kernel_text_address() check - would you prefer a v2 of that one with
> > this change incorporated? Or a second patch that just reverts the
> > above? (Given that the bogus dereference was invoked from
> > return_address() as well, I suspect that this change would make the
> > get_kernel_nofault() change I proposed in this thread redundant)
> 
> I'd prefer patches on top of my devel-stable branch, thanks.

To reinterate what I've just put on IRC - we have not got to the bottom
of this problem yet - it still very much exists.

There seems to be something of a fundamental issue with the unwinder,
it now appears to be going wrong and failing to unwind beyond a
couple of functions, and the address it's coming out with appears to
be incorrect. I've only just discovered this because I created my very
own bug, and yet again, the timing sucks with the proximity of the
merge window.

I'm getting:

[   13.198803] [<c0017728>] (unwind_backtrace) from [<c0012828>] (show_stack+0x10/0x14)
[   13.198820] [<c0012828>] (show_stack) from [<c2be78d4>] (0xc2be78d4)

for the WARN_ON() stacktrace, and that address that apparently called
show_stack() is most definitely rubbish and incorrect. This makes any
WARN_ON() condition undebuggable.

This is with both 9183/1 and 9184/1 applied on top of pulling your
"arm-ftrace-for-rmk" tag and also with just the "arm-vmap-stacks-v6"
tag. This seems to point at one of these patches breaking the
unwinder:

a1c510d0adc6 ARM: implement support for vmap'ed stacks
532319b9c418 ARM: unwind: disregard unwind info before stack frame is set up
4ab6827081c6 ARM: unwind: dump exception stack from calling frame
b6506981f880 ARM: unwind: support unwinding across multiple stacks

Given that the unwinder is broken, I wonder whether 0183/1 and 9184/1
are actually required.

I did try to point this problem out a few emails back:

"As the trace on those is:

[    0.239629]  unwind_backtrace from show_stack+0x10/0x14
[    0.239654]  show_stack from init_stack+0x1c54/0x2000                        

unwind_backtrace() and show_stack() are both C code, the compiler will
emit the unwind information for it. show_stack() isn't called from
assembly code, only from C code, so the next function's unwind
information should also be generated by the compiler.

However, init_stack is not a function - it's an array of unsigned long.
There is no way this should appear in the trace, and this suggests that
the unwind of show_stack() has gone wrong."

In Corentin's case, there is no way init_stack should ever appear in
the stack trace. In my case, it's not init_stack, but 0xc2be78d4.

Can you try testing out a dummy WARN_ON(1) test in your kernel please?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!



More information about the linux-arm-kernel mailing list