am335x: 5.18.x: system stalling

Arnd Bergmann arnd at arndb.de
Tue May 24 07:36:54 PDT 2022


On Tue, May 24, 2022 at 3:38 PM Yegor Yefremov
<yegorslists at googlemail.com> wrote:
> On Sat, May 21, 2022 at 9:41 PM Arnd Bergmann <arnd at arndb.de> wrote:
> > On Thu, May 19, 2022 at 5:52 PM Yegor Yefremov <yegorslists at googlemail.com> wrote:
> >
> > Ok, so this is just a serial port based driver, which means the
> > follow-up question
> > is what you use for your uart. Is this one of the USB-serial ones or an on-chip
> > uart? Which driver?
>
> This is the following chain: am335x -> musb-> ftdi_sio (FT-X flavor).
>
> I have also tried another system with two FT4232 chips (RS232 devices)
> and performed transmission tests. This had no effect, the system
> didn't stall.

Ok, I see. I looked at ftdi_sio, and found a couple of slightly suspicious
code paths in the FT-X specific bits, but after looking more closely I
found nothing actually wrong with them.

It might still be worth trying more combinations of those, e.g. if the FT-X
uart fails without the CAN adapter, or whether it fails on the other machine.

> > > > > CONFIG_DMA_API_DEBUG is still likely to pinpoint the bug, but I might also
> > > > > just see it by looking at the right source file.
> > > >
> > > > I'll try to get more debug info with CONFIG_DMA_API_DEBUG.
> > >
> > > DMA_API_DEBUG showed nothing new. But disabling the CPUfreq driver
> > > "solved" the problem. I have tried different governors and got these
> > > two groups:
> > >
> > > ondemand, schedutil - cause the problem
> > > conservative, powersave, performance and userspace - don't cause the problem
> > >
> > > So far, I have only seen the same debug output that I've initially
> > > sent and in most cases, the system stalls without the output.
> >
> > Ok, so that sounds like it happens when you change the frequency.
> > I assume this means you are using drivers/cpufreq/omap-cpufreq.c?
>
> Yes.
>
> > When using the usersapce governor, do you see problems when you
> > manually change the frequency from sysfs?
>
> No, I can switch between 300MHz and 600MHz and perform CAN tests.
> Everything goes well.

One more idea: maybe this is a case where we actually run out of stack
space? Without VMAP stacks, that may easily go unnoticed, but with
VMAP stack it is supposed to produce an obvious error message with a
backtrace. If we have a callchain that involves

can_xmit -> tty -> tty_usb -> usb -> musb -> schedule -> cpufreq_update_util
 -> omap_cpufreq

we might run out of the 8KB stack area. It's probably not this, but if you
want to rule it out, try using

#define THREAD_SIZE_ORDER       2

       Arnd



More information about the linux-arm-kernel mailing list