am335x: 5.18.x: system stalling

Yegor Yefremov yegorslists at googlemail.com
Wed Jun 1 03:03:57 PDT 2022


On Wed, Jun 1, 2022 at 11:28 AM Ard Biesheuvel <ardb at kernel.org> wrote:
>
> On Wed, 1 Jun 2022 at 10:08, Ard Biesheuvel <ardb at kernel.org> wrote:
> >
> > On Wed, 1 Jun 2022 at 09:59, Arnd Bergmann <arnd at arndb.de> wrote:
> > >
> > > On Wed, Jun 1, 2022 at 9:36 AM Yegor Yefremov
> > > <yegorslists at googlemail.com> wrote:
> > > > On Tue, May 31, 2022 at 5:23 PM Arnd Bergmann <arnd at arndb.de> wrote:
> > > > > I've pushed a modified branch now, with that fix on the broken commit,
> > > > > and another change to make CONFIG_IRQSTACKS user-selectable rather
> > > > > than always enabled. That should tell us if the problem is in the SMP
> > > > > patching or in the irqstacks.
> > > > >
> > > > > Can you test the top of this branch with CONFIG_IRQSTACKS disabled,
> > > > > and (if that still stalls) retest the fixed commit f0191ea5c2e5 ("[PART 1]
> > > > > ARM: implement THREAD_INFO_IN_TASK for uniprocessor systems")?
> > > >
> > > > 1. the top of this branch with CONFIG_IRQSTACKS disabled stalls
> > > > 2. f0191ea5c2e5 with the same config - not
> > >
> > > Ok, perfect, that does narrow down the problem quite a bit: The final
> > > patch has seven changes, all of which can be done individually because
> > > in each case the simplified version in f0191ea5c2e5 is meant to run
> > > the exact same instructions as the version after the change, when running
> > > on a uniprocessor machine such as your am335x.
> > >
> > > You have already shown earlier that the get_current() and
> > > __my_cpu_offset() functions are not to blame here, as reverting
> > > only those does not change the behavior.
> > >
> > > This leaves the is_smp() check in set_current(), and the
> > > four macros in <asm/assembler.h>. I don't see anything obviously
> > > wrong with any of those five, but I would bet on the macros
> > > here. Can you try bisecting into this commit, maybe reverting
> > > the changes to set_current and get_current first, and then
> > > narrowing it down to (hopefully) a single macro that causes the
> > > problem?
> > >
> >
> > set_current() is never called by the primary CPU, which is why the
> > is_smp() check was removed from there in 57a420435edcb0b94 ("ARM: drop
> > pointless SMP check on secondary startup path").
> >
> > So that leaves only the four macros in asm/assembler.h, but I don't
> > see anything obviously wrong with those either.
>
> I pushed a patch on top of Arnd's branch at the link below that gets
> rid of the subsections, and uses normal branches (and code patching)
> to switch between the thread ID register and the LDR to retrieve the
> CPU offset and the current pointer. I have no explanation whether or
> why it could make a difference, but I think it's worth a try.

The link to your repo is missing.

Yegor



More information about the linux-arm-kernel mailing list