am335x: 5.18.x: system stalling

Tony Lindgren tony at atomide.com
Thu May 26 23:50:29 PDT 2022


* Arnd Bergmann <arnd at arndb.de> [220527 06:35]:
> On Fri, May 27, 2022 at 6:44 AM Yegor Yefremov
> <yegorslists at googlemail.com> wrote:
> > On Thu, May 26, 2022 at 4:16 PM Arnd Bergmann <arnd at arndb.de> wrote:
> > >
> > > On Thu, May 26, 2022 at 2:37 PM Yegor Yefremov
> > > <yegorslists at googlemail.com> wrote:
> > > > On Thu, May 26, 2022 at 10:19 AM Ard Biesheuvel <ardb at kernel.org> wrote:
> > > > >
> > > > > On Thu, 26 May 2022 at 08:20, Tony Lindgren <tony at atomide.com> wrote:
> > > > > >
> > > > > > * Yegor Yefremov <yegorslists at googlemail.com> [220526 05:45]:
> > > > > > > On Tue, May 24, 2022 at 4:19 PM Tony Lindgren <tony at atomide.com> wrote:
> > > > > > > > Maybe also try with CONFIG_MUSB_PIO_ONLY=y to see if it makes things
> > > > > > > > better or worse :)
> > > > > > >
> > > > > > > PIO is always the last resort :-) And now it proves it again. With
> > > > > > > PIO_ONLY the system doesn't stall.
> > > > > >
> > > > > > OK great :) So it has something to do with drivers/dma/ti/cppi41.c, or
> > > > > > with drivers/usb/musb/cppi_dma.c or whatever the dma for am335x here
> > > > > > is. Or maybe there's something using stack for buffers being passed to
> > > > > > dma again that breaks with vmap stack.
> > > > > >
> > > > >
> > > > > In order to confirm this theory, could you please try rebuilding your
> > > > > kernel with CONFIG_VMAP_STACK disabled, and leave everything else as
> > > > > before?
> > > >
> > > > I have disabled the CONFIG_VMAP_STACK option:
> > > >
> > > > # zcat /proc/config.gz | grep VMAP_STACK
> > > > CONFIG_HAVE_ARCH_VMAP_STACK=y
> > > > # CONFIG_VMAP_STACK is not set
> > > >
> > > > The system stalls.
> > >
> > > Ok, I guess that means we can stop looking for invalid DMA buffers
> > > on stacks. Out of the original commits you listed as possible causes,
> > > we can also rule out 23d9a9280efe ("ARM: 9177/1: disable vmap'ed
> > > stacks on suspend-capable SMP configs") and cafc0eab1689
> > > ("ARM: v7m: enable support for IRQ stacks"). It could still be
> > > 9c46929e7989 ("ARM: implement THREAD_INFO_IN_TASK for
> > > uniprocessor systems") and 5fe41793bc78 ("ARM: 9176/1: avoid
> > > literal references in inline assembly") or possibly the merge.
> > >
> > > Can you post the whole .config file somewhere for reference?
> > > In particular, do you have CONFIG_SMP, CONFIG_LD_IS_LLD
> > > or CURRENT_POINTER_IN_TPIDRURO set?
> >
> > This is my config [1] and this is the system in question [2].
> >
> > [1] https://github.com/visionsystemsgmbh/onrisc_br_bsp/blob/master/board/vscom/baltos/linux-experimental-config
> 
> Thanks! The first thing I noticed in here is that this config enables both
> CONFIG_ARCH_MULTI_V6 (for OMAP2) and CONFIG_SMP, which
> gets you into a couple of corner cases that nobody else hits in practice.
> 
> Can you still reproduce the problem if you turn off both of these?

Based on what we just discussed on #armlinux, testing before and after
commit 9c46929e7989 ("ARM: implement THREAD_INFO_IN_TASK for uniprocessor
systems") might be a good idea as it enables some config options that
did not get enabled earlier.

Another thing that might help is to bisect again and ensure vmap stack
config option stays disabled so issues related to vmap stack are kept
out of the way.

Regards,

Tony



More information about the linux-arm-kernel mailing list