am335x: 5.18.x: system stalling

Arnd Bergmann arnd at arndb.de
Fri Jun 3 02:32:01 PDT 2022


On Fri, Jun 3, 2022 at 10:54 AM Yegor Yefremov
<yegorslists at googlemail.com> wrote:
> On Thu, Jun 2, 2022 at 2:27 PM Yegor Yefremov <yegorslists at googlemail.com> wrote:
> > On Thu, Jun 2, 2022 at 12:37 PM Ard Biesheuvel <ardb at kernel.org> wrote:

> > > That confirms my statement that smp.c cannot be the culprit, and
> > > appears to exonerate the pure asm pieces. I wonder if this is related
> > > to insufficient asm constraints on the C helpers, or just the cost
> > > model taking different decisions because the inline asm string is much
> > > longer. In any case, this opens up a couple of avenues we could
> > > explore to narrow this down further.
> > >
> > > As a quick check, can you try the below snippet applied onto the
> > > broken current.h build?
> > >
> > > --- a/arch/arm/include/asm/current.h
> > > +++ b/arch/arm/include/asm/current.h
> > > @@ -53,7 +53,8 @@ static __always_inline __attribute_const__ struct
> > > task_struct *get_current(void)
> > >             "   b       . + (2b - 0b)                           \n\t"
> > >             "   .popsection                                     \n\t"
> > >  #endif
> > > -           : "=r"(cur));
> > > +           : "=r"(cur)
> > > +           : "Q" (*(const unsigned long *)current_stack_pointer));
> >
> > Where is the current_stack_pointer defined?
> >
> > >  #elif __LINUX_ARM_ARCH__>= 7 || \
> > >        !defined(CONFIG_ARM_HAS_GROUP_RELOCS) || \
> > >        (defined(MODULE) && defined(CONFIG_ARM_MODULE_PLTS))
> > >
> > > Given that the problematic sequence appears to be in C code, could you
> > > please confirm whether or not the stall is reproducible when all the
> > > pieces that are used by the CAN stack (musb, slcan, ftdio-sio, etc)
> > > are built into the kernel rather than built as modules? Also, which
> > > GCC version are you using?
> >
> > For now, the CAN stack parts are built as modules. I'll try to compile them in.
> >
> > I'm using GCC 10.x
>
> I have tried your patch (see the attachment) and the system stalls.

This is with only get_current() patched on top of the working
f0191ea5c2e5 ("[PART 1] ARM: implement THREAD_INFO_IN_TASK for
uniprocessor systems"), right?

My best theory right now is that something in get_currnent() is wrong that
causes it to return the wrong task pointer, which in turn leads to
current->preempt_count to get out of sync. This may be related to the cppi41
dmaengine tasklet and effectively disables further softirqs including the timer
that triggers the RCU grace period.

When we finally switch tasks to the cpufreq worker thread, softirqs
can happen again because of the task switch, and at the next IRQ
the timer detects the stall.

> Will try GCC 11.x and also compiled-in drivers.

Ok. Maybe make sure all drivers are built-in here. I see both the CAN
layer and the cppi41 driver use softirqs, so to be on the safe side,
try to get to a running kernel that has no modules loaded at all at
the time you expect the stall.


One thing that could possibly go wrong with get_current() would be that
it fails to get patched for some reason, or it gets patched only after
it was already called. Since you run on an ARMv7 CPU as opposed to
an actual OMAP2410/ARM1136r0, it would then try to load the
variable from the uninitialized TPIDRURO register. If that happens,
the one-liner below should tell you exactly where, by triggering an
Oops. You can apply the patch on top for testing, it should have no
other effects if the patching part works correctly.

        Anrd

8<---

diff --git a/arch/arm/include/asm/current.h b/arch/arm/include/asm/current.h
index 2f9d79214b25..2a15832793c4 100644
--- a/arch/arm/include/asm/current.h
+++ b/arch/arm/include/asm/current.h
@@ -33,7 +33,7 @@ static inline __attribute_const__ struct task_struct
*get_current(void)
         */
        cur = __builtin_thread_pointer();
 #elif defined(CONFIG_CURRENT_POINTER_IN_TPIDRURO) || defined(CONFIG_SMP)
-       asm("0: mrc p15, 0, %0, c13, c0, 3                      \n\t"
+       asm("0: .long 0xe7f001f2                        \n\t" // BUG() trap
 #ifdef CONFIG_CPU_V6
            "1:                                                 \n\t"
            "   .subsection 1                                   \n\t"



More information about the linux-arm-kernel mailing list