usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s

Stefan Wahren stefan.wahren at i2se.com
Tue Apr 25 14:11:09 EDT 2017


> Stefan Wahren <stefan.wahren at i2se.com> hat am 22. April 2017 um 22:50 geschrieben:
> 
> 
> Hi,
> 
> > Eric Anholt <eric at anholt.net> hat am 20. April 2017 um 20:54 geschrieben:
> > 
> > 
> > Stefan Wahren <stefan.wahren at i2se.com> writes:
> > 
> > > Hi,
> > >
> > >> Doug Anderson <dianders at chromium.org> hat am 18. April 2017 um 22:41 geschrieben:
> > >> 
> > >> 
> > >> It's hard to know for sure that all of this time is really in
> > >> urb_enqueue().  Possible we could have task switched out and been
> > >> blocked elsewhere.  Using ftrace to get more fine-grained timings
> > >> would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
> > >> friends here if you want to use trace_printk.
> > >
> > > i'm a newbie to ftrace, so i hope this would be helpful.
> > >
> > > # connect PL2303 to the onboard hub
> > > # echo 0 > options/sleep-time
> > > # echo 0 > function_profile_enabled
> > > # echo 1 > function_profile_enabled
> > > # ./usb_test
> > > # Waiting for at least 20 seconds and then disconnect PL2303
> > > # echo 0 > function_profile_enabled
> > > # cat trace_stat/function0
> > >
> > >   Function                               Hit    Time            Avg             s^2
> > >   --------                               ---    ----            ---             ---
> > >   bcm2835_handle_irq                  361347    219567633 us     607.636 us      1485199 us  
> > >   __handle_domain_irq                1082482    212639551 us     196.437 us      3642030 us  
> > >   generic_handle_irq                 1082482    100592051 us     92.927 us       50511334 us 
> > >   irq_exit                           1082482    98197771 us     90.715 us       29649040 us 
> > >   handle_level_irq                   1082482    95812379 us     88.511 us       51910093 us 
> > 
> > If I'm reading this output right, we're spending half of our interrupt
> > processing time in irq_exit(), so even if dwc2's interrupt was free (the
> > generic_handle_irq() chain), we'd be eating about half the CPU getting
> > back out of the interrupt handler, right?
> > 
> > I don't really know anything about DWC2 or USB, but is there any way we
> > could mitigate the interrupt frequency with this hardware?  If nothing
> > else, could we loop reading gintsts until it reads back 0?
> 
> first of all i updated my kernel to 4.11rc7 and the issue still occures. Today i had some time to investigate this issue further and i made some interesting observations:
> 
> 1. The lockup doesn't occure always after starting usb_test. In rare cases i was able to run the program without lockup.
> 2. In case the lockup occured we are always able to "rescue" the BCM2835 from this state by sending some serial data to the PL2303.

Based on this scenario i patched the interrupt routine to detect the interrupt storm and normal condition. So i can dump the global and host registers in both situations (bad and goodcase).

Here is the diff between both register dumps, maybe someone see something interesting:

--- badcase.txt	2017-04-25 18:02:59.000000000 +0000
+++ goodcase.txt	2017-04-25 18:02:59.000000000 +0000
@@ -4,12 +4,12 @@
  dwc2_dump_global_registers: GAHBCFG	 @0xCC850008 : 0x00000031
  dwc2_dump_global_registers: GUSBCFG	 @0xCC85000C : 0x20001700
  dwc2_dump_global_registers: GRSTCTL	 @0xCC850010 : 0x80000000
- dwc2_dump_global_registers: GINTSTS	 @0xCC850014 : 0x06000021
+ dwc2_dump_global_registers: GINTSTS	 @0xCC850014 : 0x06000029
  dwc2_dump_global_registers: GINTMSK	 @0xCC850018 : 0xF300080E
- dwc2_dump_global_registers: GRXSTSR	 @0xCC85001C : 0x00070044
+ dwc2_dump_global_registers: GRXSTSR	 @0xCC85001C : 0x000F0001
  dwc2_dump_global_registers: GRXFSIZ	 @0xCC850024 : 0x00000306
  dwc2_dump_global_registers: GNPTXFSIZ	 @0xCC850028 : 0x01000306
- dwc2_dump_global_registers: GNPTXSTS	 @0xCC85002C : 0x1D080100
+ dwc2_dump_global_registers: GNPTXSTS	 @0xCC85002C : 0x18080100
  dwc2_dump_global_registers: GI2CCTL	 @0xCC850030 : 0x00000000
  dwc2_dump_global_registers: GPVNDCTL	 @0xCC850034 : 0x00000000
  dwc2_dump_global_registers: GGPIO	 @0xCC850038 : 0x00000000
@@ -27,64 +27,64 @@
  dwc2_dump_host_registers: Host Global Registers
  dwc2_dump_host_registers: HCFG	 @0xCC850400 : 0x00000000
  dwc2_dump_host_registers: HFIR	 @0xCC850404 : 0x00001D4B
- dwc2_dump_host_registers: HFNUM	 @0xCC850408 : 0x0BF03CA1
- dwc2_dump_host_registers: HPTXSTS	 @0xCC850410 : 0x29080200
- dwc2_dump_host_registers: HAINT	 @0xCC850414 : 0x00000002
- dwc2_dump_host_registers: HAINTMSK	 @0xCC850418 : 0x00000002
+ dwc2_dump_host_registers: HFNUM	 @0xCC850408 : 0x029F3421
+ dwc2_dump_host_registers: HPTXSTS	 @0xCC850410 : 0x25080200
+ dwc2_dump_host_registers: HAINT	 @0xCC850414 : 0x00000001
+ dwc2_dump_host_registers: HAINTMSK	 @0xCC850418 : 0x00000001
  dwc2_dump_host_registers: HPRT0	 @0xCC850440 : 0x00001005
  dwc2_dump_host_registers: Host Channel 0 Specific Registers
- dwc2_dump_host_registers: HCCHAR	 @0xCC850500 : 0x01189840
- dwc2_dump_host_registers: HCSPLT	 @0xCC850504 : 0x8001C102
- dwc2_dump_host_registers: HCINT	 @0xCC850508 : 0x00000000
- dwc2_dump_host_registers: HCINTMSK	 @0xCC85050C : 0x00000000
- dwc2_dump_host_registers: HCTSIZ	 @0xCC850510 : 0x00080040
- dwc2_dump_host_registers: HCDMA	 @0xCC850514 : 0x4A07D300
+ dwc2_dump_host_registers: HCCHAR	 @0xCC850500 : 0x013C880A
+ dwc2_dump_host_registers: HCSPLT	 @0xCC850504 : 0x8000C102
+ dwc2_dump_host_registers: HCINT	 @0xCC850508 : 0x00000022
+ dwc2_dump_host_registers: HCINTMSK	 @0xCC85050C : 0x00000006
+ dwc2_dump_host_registers: HCTSIZ	 @0xCC850510 : 0x4008000A
+ dwc2_dump_host_registers: HCDMA	 @0xCC850514 : 0x4A093840
  dwc2_dump_host_registers: Host Channel 1 Specific Registers
  dwc2_dump_host_registers: HCCHAR	 @0xCC850520 : 0x01189840
  dwc2_dump_host_registers: HCSPLT	 @0xCC850524 : 0x8000C102
- dwc2_dump_host_registers: HCINT	 @0xCC850528 : 0x00000022
- dwc2_dump_host_registers: HCINTMSK	 @0xCC85052C : 0x00000006
+ dwc2_dump_host_registers: HCINT	 @0xCC850528 : 0x00000000
+ dwc2_dump_host_registers: HCINTMSK	 @0xCC85052C : 0x00000000
  dwc2_dump_host_registers: HCTSIZ	 @0xCC850530 : 0x00080040
- dwc2_dump_host_registers: HCDMA	 @0xCC850534 : 0x4A07D300
+ dwc2_dump_host_registers: HCDMA	 @0xCC850534 : 0x4A07D500
  dwc2_dump_host_registers: Host Channel 2 Specific Registers
  dwc2_dump_host_registers: HCCHAR	 @0xCC850540 : 0x01189840
  dwc2_dump_host_registers: HCSPLT	 @0xCC850544 : 0x8001C102
  dwc2_dump_host_registers: HCINT	 @0xCC850548 : 0x00000000
  dwc2_dump_host_registers: HCINTMSK	 @0xCC85054C : 0x00000000
  dwc2_dump_host_registers: HCTSIZ	 @0xCC850550 : 0x00080040
- dwc2_dump_host_registers: HCDMA	 @0xCC850554 : 0x4A07D300
+ dwc2_dump_host_registers: HCDMA	 @0xCC850554 : 0x4A07D500
  dwc2_dump_host_registers: Host Channel 3 Specific Registers
- dwc2_dump_host_registers: HCCHAR	 @0xCC850560 : 0x01189840
- dwc2_dump_host_registers: HCSPLT	 @0xCC850564 : 0x8001C102
+ dwc2_dump_host_registers: HCCHAR	 @0xCC850560 : 0x00D08040
+ dwc2_dump_host_registers: HCSPLT	 @0xCC850564 : 0x00000000
  dwc2_dump_host_registers: HCINT	 @0xCC850568 : 0x00000000
  dwc2_dump_host_registers: HCINTMSK	 @0xCC85056C : 0x00000000
- dwc2_dump_host_registers: HCTSIZ	 @0xCC850570 : 0x00080040
- dwc2_dump_host_registers: HCDMA	 @0xCC850574 : 0x4A07D300
+ dwc2_dump_host_registers: HCTSIZ	 @0xCC850570 : 0x8000003C
+ dwc2_dump_host_registers: HCDMA	 @0xCC850574 : 0x4A3B7544
  dwc2_dump_host_registers: Host Channel 4 Specific Registers
- dwc2_dump_host_registers: HCCHAR	 @0xCC850580 : 0x01189840
- dwc2_dump_host_registers: HCSPLT	 @0xCC850584 : 0x8001C102
+ dwc2_dump_host_registers: HCCHAR	 @0xCC850580 : 0x00D00040
+ dwc2_dump_host_registers: HCSPLT	 @0xCC850584 : 0x00000000
  dwc2_dump_host_registers: HCINT	 @0xCC850588 : 0x00000000
  dwc2_dump_host_registers: HCINTMSK	 @0xCC85058C : 0x00000000
- dwc2_dump_host_registers: HCTSIZ	 @0xCC850590 : 0x00080040
- dwc2_dump_host_registers: HCDMA	 @0xCC850594 : 0x4A07D300
+ dwc2_dump_host_registers: HCTSIZ	 @0xCC850590 : 0x20000008
+ dwc2_dump_host_registers: HCDMA	 @0xCC850594 : 0x4A3B74C8
  dwc2_dump_host_registers: Host Channel 5 Specific Registers
  dwc2_dump_host_registers: HCCHAR	 @0xCC8505A0 : 0x01189840
  dwc2_dump_host_registers: HCSPLT	 @0xCC8505A4 : 0x8000C102
  dwc2_dump_host_registers: HCINT	 @0xCC8505A8 : 0x00000000
  dwc2_dump_host_registers: HCINTMSK	 @0xCC8505AC : 0x00000000
  dwc2_dump_host_registers: HCTSIZ	 @0xCC8505B0 : 0x00080040
- dwc2_dump_host_registers: HCDMA	 @0xCC8505B4 : 0x4A07D300
+ dwc2_dump_host_registers: HCDMA	 @0xCC8505B4 : 0x4A07D500
  dwc2_dump_host_registers: Host Channel 6 Specific Registers
- dwc2_dump_host_registers: HCCHAR	 @0xCC8505C0 : 0x01189840
- dwc2_dump_host_registers: HCSPLT	 @0xCC8505C4 : 0x8001C102
+ dwc2_dump_host_registers: HCCHAR	 @0xCC8505C0 : 0x00D00040
+ dwc2_dump_host_registers: HCSPLT	 @0xCC8505C4 : 0x00000000
  dwc2_dump_host_registers: HCINT	 @0xCC8505C8 : 0x00000000
  dwc2_dump_host_registers: HCINTMSK	 @0xCC8505CC : 0x00000000
- dwc2_dump_host_registers: HCTSIZ	 @0xCC8505D0 : 0x00080040
- dwc2_dump_host_registers: HCDMA	 @0xCC8505D4 : 0x4A07D300
+ dwc2_dump_host_registers: HCTSIZ	 @0xCC8505D0 : 0x00000000
+ dwc2_dump_host_registers: HCDMA	 @0xCC8505D4 : 0x4B080000
  dwc2_dump_host_registers: Host Channel 7 Specific Registers
- dwc2_dump_host_registers: HCCHAR	 @0xCC8505E0 : 0x01189840
+ dwc2_dump_host_registers: HCCHAR	 @0xCC8505E0 : 0x013C880A
  dwc2_dump_host_registers: HCSPLT	 @0xCC8505E4 : 0x8001C102
  dwc2_dump_host_registers: HCINT	 @0xCC8505E8 : 0x00000000
  dwc2_dump_host_registers: HCINTMSK	 @0xCC8505EC : 0x00000000
- dwc2_dump_host_registers: HCTSIZ	 @0xCC8505F0 : 0x00080040
- dwc2_dump_host_registers: HCDMA	 @0xCC8505F4 : 0x4A07D300
+ dwc2_dump_host_registers: HCTSIZ	 @0xCC8505F0 : 0x4008000A
+ dwc2_dump_host_registers: HCDMA	 @0xCC8505F4 : 0x4A093840



More information about the linux-arm-kernel mailing list