usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s
johan at kernel.org
Tue Oct 17 01:52:10 PDT 2017
On Mon, Oct 16, 2017 at 01:49:11PM -0700, Julius Werner wrote:
> > d9a14b00 339317035 C Ii:1:004:1 -32:1 0
> > d9a14b00 339317049 S Ii:1:004:1 -115:1 10 <
> > d9a14b00 339318040 C Ii:1:004:1 -32:1 0
> > d9a14b00 339318057 S Ii:1:004:1 -115:1 10 <
> > d9a14b00 339319042 C Ii:1:004:1 -32:1 0
> > d9a14b00 339319056 S Ii:1:004:1 -115:1 10 <
> > d9a14b00 339329551 C Ii:1:004:1 -32:1 0
> > d9a14b00 339329571 S Ii:1:004:1 -115:1 10 <
> > d9a14b00 339330586 C Ii:1:004:1 -32:1 0
> > d9a14b00 339330601 S Ii:1:004:1 -115:1 10 <
> > d9a14b00 339331035 C Ii:1:004:1 -32:1 0
> Sorry for necromancing an old thread, but I just happened to read
> through this and thought someone might care:
> If I read that right, the usbmon output shows that the interrupt
> endpoint is stalled (keeps returning -EPIPE). A STALL is a special
> device-side USB condition that tells the host something is wrong and
> will persist until cleared manually. It seems that the driver isn't
> prepared for this (see
> drivers/usb/serial/pl2303.c#pl2303_read_int_callback) and just keeps
> resubmitting the URB, so it will stall again as fast as the endpoint
> allows it to. This may be the reason why you get so many transfers
> that it overwhelms the CPU.
That's a bug in the driver, we should not resubmit (without further
action) on -EPIPE.
> A fix would be to catch -EPIPE in that function and handle it
> explicitly (with either a CLEAR_STALL to the endpoint or a full USB
> reset... would have to look at the documentation for PL2303 to see
> what the stall actually means and how you're supposed to treat it).
Yes, but we can't just clear the halt from the completion handler, so
you'd typically have to schedule a work struct and call usb_clear_halt
from there. Only then could we try resubmitting the URBs, but chances
are we'd just hit that stall again (with the hardware setup in
question). Note that no usb-serial drivers currently implement any such
stall recovery, and just stop resubmitting the URB on -EPIPE.
Or at least so I thought. The generic implementation (which most drivers
rely on) and a few others get this right, but we have a number of legacy
drivers with custom implementations that do resubmit on -EPIPE
(including the pl2303 one).
I'll go fix up that up.
More information about the linux-rpi-kernel