am335x: 5.18.x: system stalling

Yegor Yefremov yegorslists at googlemail.com
Tue May 31 07:16:46 PDT 2022


On Tue, May 31, 2022 at 10:36 AM Yegor Yefremov
<yegorslists at googlemail.com> wrote:
>
> On Mon, May 30, 2022 at 5:15 PM Ard Biesheuvel <ardb at kernel.org> wrote:
> >
> > On Mon, 30 May 2022 at 15:54, Arnd Bergmann <arnd at arndb.de> wrote:
> > >
> > > On Sat, May 28, 2022 at 9:28 PM Yegor Yefremov
> > > <yegorslists at googlemail.com> wrote:
> > > >
> > > > On Sat, May 28, 2022 at 3:14 PM Arnd Bergmann <arnd at arndb.de> wrote:
> > > > >
> > > > > On Sat, May 28, 2022 at 3:01 PM Yegor Yefremov
> > > > > <yegorslists at googlemail.com> wrote:
> > > > > > On Sat, May 28, 2022 at 11:07 AM Ard Biesheuvel <ardb at kernel.org> wrote:
> > > > > > In file included from ./include/linux/irqflags.h:17,
> > > > > >                  from ./arch/arm/include/asm/bitops.h:28,
> > > > > >                  from ./include/linux/bitops.h:33,
> > > > > >                  from ./include/linux/log2.h:12,
> > > > > >                  from kernel/bounds.c:13:
> > > > > > ./arch/arm/include/asm/percpu.h: In function ‘__my_cpu_offset’:
> > > > > > ./arch/arm/include/asm/percpu.h:32:9: error: ‘__per_cpu_offset’
> > > > > > undeclared (first use in this function); did you mean
> > > > > > ‘__my_cpu_offset’?
> > > > > >    32 |  return __per_cpu_offset[0];
> > > > > >       |         ^~~~~~~~~~~~~~~~
> > > > > >       |         __my_cpu_offset
> > > > > > ./arch/arm/include/asm/percpu.h:32:9: note: each undeclared identifier
> > > > > > is reported only once for each function it appears in
> > > > >
> > > > > I think you just missed the line in my patch that adds the
> > > > > "extern unsigned long __per_cpu_offset[];" variable declaration.
> > > >
> > > > So, I tried both variants and both led to stalls.
> > >
> > > I'm running out of ideas here.  Going to back to the original bisection,
> > > I rebased Ard's patches in a way that you should be able to build the
> > > config for each patch, and I split up the "ARM: implement
> > > THREAD_INFO_IN_TASK for uniprocessor systems" commit in yet
> > > another way, hoping to get something left over that points to the
> > > bug. Can you try bisecting through the top commits of
> > >
> > > https://kernel.org/pub/scm/linux/kernel/git/soc/soc.git am335x-stall-test
> > >
> > > starting maybe with "52d240871760 irqchip: nvic: Use
> > > GENERIC_IRQ_MULTI_HANDLER" as the patch that is almost certainly
> > > going to be ok?
> > >
> > > At some point I fear we may have to give up and just mark the v6+SMP
> > > configuration as broken, which is something we have considered in the
> > > past but ended up always keeping around for the purpose of testing
> > > omap2plus_defconfig and imx_v6_v7_defconfig. Note that on production
> > > systems you probably don't want to use that config anway, and should
> > > either stick to a uniprocessor build, or disable the ARMv6 support.
> > >
> >
> > Yeah, I am also running out of ideas. One question, though: does the
> > RCU detected stall always occur in the same place? I.e., how similar
> > are the backtraces of the stalls between different occurrences?
> > Perhaps we could narrow down where in the code we are stalling, and
> > gain some more understanding of the root cause.
>
> I have attached 4 crash logs and will start with Arnd's branch bisecting.

My bisect results:

git bisect log
git bisect start
# good: [52d24087176055d5994ac98378426421b2d6d653] irqchip: nvic: Use
GENERIC_IRQ_MULTI_HANDLER
git bisect good 52d24087176055d5994ac98378426421b2d6d653
# bad: [2d3456213319c0277ee6082946c43c3afacca9b4] [PART 2] ARM:
implement THREAD_INFO_IN_TASK for uniprocessor system
git bisect bad 2d3456213319c0277ee6082946c43c3afacca9b4
# good: [20e50fc1187d82d6d9ef80c01cf8e11d476f6227] ARM: 9176/1: avoid
literal references in inline assembly
git bisect good 20e50fc1187d82d6d9ef80c01cf8e11d476f6227
# good: [59f3cd822afe6445b2864d0cf1a73ca6edd24f42] ARM: smp: defer
TPIDRURO update for SMP v6 configurations too
git bisect good 59f3cd822afe6445b2864d0cf1a73ca6edd24f42
# bad: [b6b3b4814e77d2f5a7517297e9ac1d1aa1cda103] [PART 1] ARM:
implement THREAD_INFO_IN_TASK for uniprocessor systems
git bisect bad b6b3b4814e77d2f5a7517297e9ac1d1aa1cda103
# good: [dccfc18999cf4b4e518f01d5c7c578426166e5f2] ARM: v7m: enable
support for IRQ stacks
git bisect good dccfc18999cf4b4e518f01d5c7c578426166e5f2
# first bad commit: [b6b3b4814e77d2f5a7517297e9ac1d1aa1cda103] [PART
1] ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems

Though commit b6b3b4814e77d2f5a7517297e9ac1d1aa1cda103 led to a broken
kernel that didn't even show any output after the bootloader had
started it.

Commit 2d3456213319c0277ee6082946c43c3afacca9b4 showed the expected stalling.

Yegor



More information about the linux-arm-kernel mailing list