usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s
Stefan Wahren
stefan.wahren at i2se.com
Mon May 8 13:22:08 PDT 2017
> Stefan Wahren <stefan.wahren at i2se.com> hat am 25. April 2017 um 20:11 geschrieben:
>
>
> > Stefan Wahren <stefan.wahren at i2se.com> hat am 22. April 2017 um 22:50 geschrieben:
> >
> >
> > Hi,
> >
> > > Eric Anholt <eric at anholt.net> hat am 20. April 2017 um 20:54 geschrieben:
> > >
> > >
> > > Stefan Wahren <stefan.wahren at i2se.com> writes:
> > >
> > > > Hi,
> > > >
> > > >> Doug Anderson <dianders at chromium.org> hat am 18. April 2017 um 22:41 geschrieben:
> > > >>
> > > >>
> > > >> It's hard to know for sure that all of this time is really in
> > > >> urb_enqueue(). Possible we could have task switched out and been
> > > >> blocked elsewhere. Using ftrace to get more fine-grained timings
> > > >> would be useful. ktime_get(), ktime_sub(), and ktime_to_us() are your
> > > >> friends here if you want to use trace_printk.
> > > >
> > > > i'm a newbie to ftrace, so i hope this would be helpful.
> > > >
> > > > # connect PL2303 to the onboard hub
> > > > # echo 0 > options/sleep-time
> > > > # echo 0 > function_profile_enabled
> > > > # echo 1 > function_profile_enabled
> > > > # ./usb_test
> > > > # Waiting for at least 20 seconds and then disconnect PL2303
> > > > # echo 0 > function_profile_enabled
> > > > # cat trace_stat/function0
> > > >
> > > > Function Hit Time Avg s^2
> > > > -------- --- ---- --- ---
> > > > bcm2835_handle_irq 361347 219567633 us 607.636 us 1485199 us
> > > > __handle_domain_irq 1082482 212639551 us 196.437 us 3642030 us
> > > > generic_handle_irq 1082482 100592051 us 92.927 us 50511334 us
> > > > irq_exit 1082482 98197771 us 90.715 us 29649040 us
> > > > handle_level_irq 1082482 95812379 us 88.511 us 51910093 us
> > >
> > > If I'm reading this output right, we're spending half of our interrupt
> > > processing time in irq_exit(), so even if dwc2's interrupt was free (the
> > > generic_handle_irq() chain), we'd be eating about half the CPU getting
> > > back out of the interrupt handler, right?
> > >
> > > I don't really know anything about DWC2 or USB, but is there any way we
> > > could mitigate the interrupt frequency with this hardware? If nothing
> > > else, could we loop reading gintsts until it reads back 0?
> >
> > first of all i updated my kernel to 4.11rc7 and the issue still occures. Today i had some time to investigate this issue further and i made some interesting observations:
> >
> > 1. The lockup doesn't occure always after starting usb_test. In rare cases i was able to run the program without lockup.
> > 2. In case the lockup occured we are always able to "rescue" the BCM2835 from this state by sending some serial data to the PL2303.
>
> Based on this scenario i patched the interrupt routine to detect the interrupt storm and normal condition. So i can dump the global and host registers in both situations (bad and goodcase).
>
> Here is the diff between both register dumps, maybe someone see something interesting:
>
> --- badcase.txt 2017-04-25 18:02:59.000000000 +0000
> +++ goodcase.txt 2017-04-25 18:02:59.000000000 +0000
> @@ -4,12 +4,12 @@
> dwc2_dump_global_registers: GAHBCFG @0xCC850008 : 0x00000031
> dwc2_dump_global_registers: GUSBCFG @0xCC85000C : 0x20001700
> dwc2_dump_global_registers: GRSTCTL @0xCC850010 : 0x80000000
> - dwc2_dump_global_registers: GINTSTS @0xCC850014 : 0x06000021
> + dwc2_dump_global_registers: GINTSTS @0xCC850014 : 0x06000029
> dwc2_dump_global_registers: GINTMSK @0xCC850018 : 0xF300080E
> - dwc2_dump_global_registers: GRXSTSR @0xCC85001C : 0x00070044
> + dwc2_dump_global_registers: GRXSTSR @0xCC85001C : 0x000F0001
> dwc2_dump_global_registers: GRXFSIZ @0xCC850024 : 0x00000306
> dwc2_dump_global_registers: GNPTXFSIZ @0xCC850028 : 0x01000306
> - dwc2_dump_global_registers: GNPTXSTS @0xCC85002C : 0x1D080100
> + dwc2_dump_global_registers: GNPTXSTS @0xCC85002C : 0x18080100
> ...
@Doug, @John Y., @Johan: Any further ideas to narrow down the reason for this interrupt storm?
Regards
Stefan
More information about the linux-rpi-kernel
mailing list