am335x: 5.18.x: system stalling

Ard Biesheuvel ardb at kernel.org
Mon May 30 08:14:58 PDT 2022


On Mon, 30 May 2022 at 15:54, Arnd Bergmann <arnd at arndb.de> wrote:
>
> On Sat, May 28, 2022 at 9:28 PM Yegor Yefremov
> <yegorslists at googlemail.com> wrote:
> >
> > On Sat, May 28, 2022 at 3:14 PM Arnd Bergmann <arnd at arndb.de> wrote:
> > >
> > > On Sat, May 28, 2022 at 3:01 PM Yegor Yefremov
> > > <yegorslists at googlemail.com> wrote:
> > > > On Sat, May 28, 2022 at 11:07 AM Ard Biesheuvel <ardb at kernel.org> wrote:
> > > > In file included from ./include/linux/irqflags.h:17,
> > > >                  from ./arch/arm/include/asm/bitops.h:28,
> > > >                  from ./include/linux/bitops.h:33,
> > > >                  from ./include/linux/log2.h:12,
> > > >                  from kernel/bounds.c:13:
> > > > ./arch/arm/include/asm/percpu.h: In function ‘__my_cpu_offset’:
> > > > ./arch/arm/include/asm/percpu.h:32:9: error: ‘__per_cpu_offset’
> > > > undeclared (first use in this function); did you mean
> > > > ‘__my_cpu_offset’?
> > > >    32 |  return __per_cpu_offset[0];
> > > >       |         ^~~~~~~~~~~~~~~~
> > > >       |         __my_cpu_offset
> > > > ./arch/arm/include/asm/percpu.h:32:9: note: each undeclared identifier
> > > > is reported only once for each function it appears in
> > >
> > > I think you just missed the line in my patch that adds the
> > > "extern unsigned long __per_cpu_offset[];" variable declaration.
> >
> > So, I tried both variants and both led to stalls.
>
> I'm running out of ideas here.  Going to back to the original bisection,
> I rebased Ard's patches in a way that you should be able to build the
> config for each patch, and I split up the "ARM: implement
> THREAD_INFO_IN_TASK for uniprocessor systems" commit in yet
> another way, hoping to get something left over that points to the
> bug. Can you try bisecting through the top commits of
>
> https://kernel.org/pub/scm/linux/kernel/git/soc/soc.git am335x-stall-test
>
> starting maybe with "52d240871760 irqchip: nvic: Use
> GENERIC_IRQ_MULTI_HANDLER" as the patch that is almost certainly
> going to be ok?
>
> At some point I fear we may have to give up and just mark the v6+SMP
> configuration as broken, which is something we have considered in the
> past but ended up always keeping around for the purpose of testing
> omap2plus_defconfig and imx_v6_v7_defconfig. Note that on production
> systems you probably don't want to use that config anway, and should
> either stick to a uniprocessor build, or disable the ARMv6 support.
>

Yeah, I am also running out of ideas. One question, though: does the
RCU detected stall always occur in the same place? I.e., how similar
are the backtraces of the stalls between different occurrences?
Perhaps we could narrow down where in the code we are stalling, and
gain some more understanding of the root cause.



More information about the linux-arm-kernel mailing list