[FS#764] MT6721: Any traffic shaping results in crashes/stack traces

LEDE Bugs lede-bugs at lists.infradead.org
Sat May 6 00:06:41 PDT 2017


A new Flyspray task has been opened.  Details are below. 

User who did this - Jaap Buurman (Mushoz) 

Attached to Project - LEDE Project
Summary - MT6721: Any traffic shaping results in crashes/stack traces
Task Type - Bug Report
Category - Base system
Status - Unconfirmed
Assigned To - 
Operating System - All
Severity - Medium
Priority - Very Low
Reported Version - Trunk
Due in Version - Undecided
Due Date - Undecided
Details - There has been a large number of reports of bugs with MT7621 devices in combination with SQM. Debugging is difficult, because it often results in a hardcrash which leaves no log files. I believe I have some interesting details that might make it easier to debug.

**Device:** DIR-860L rev B1, but according to reports all MT7621 devices are affected.
**LEDE Version:** LEDE Reboot SNAPSHOT r4094-961c0ea 
**Steps to reproduce:** Run a dslreports.com speedtest with a large number of upload and download streams (32/32) with either SQM or QOS enabled on your WAN interface.

**Observations:**
  * It happens both with SQM-scripts _and_ QOS. So I don't believe it is an issue with the SQM package specifically. These two packages have in common that they both shape traffic.
  * It seems to be **load dependent**. 100/100 and 200/200 mbit egress/ingress limits crash less often than 300/300 or higher limits
  * It happens with all qdiscs: Cake + piece of cake, fq_codel + simple, fq_codel + simplest

**Crash log:**

There is usually no crash log because the router hardlocks and then reboots. But I got very lucky once and managed to get a log of the event:

 [  710.140000] INFO: rcu_sched detected stalls on CPUs/tasks:
[  710.150000] 	1-...: (257 GPs behind) idle=dfc/0/0 softirq=48167/48179 fqs=1 
[  710.160000] 	(detected by 2, t=6004 jiffies, g=13114, c=13113, q=1063)
[  710.170000] Task dump for CPU 1:
[  710.180000] swapper/1       R running      0     0      1 0x00100000
[  710.190000] Stack : 00000000 5b6c286a 000000a3 ffffffff 00000090 773742c0 804df2a4 80490000
[  710.190000] 	  8048c75c 00000001 00000001 8048c540 8048c724 80490000 00000000 800135e4
[  710.190000] 	  00000000 00000001 87c70000 87c71ec0 80490000 8005ec74 1100fc03 00000001
[  710.190000] 	  00000000 80490000 804df2a4 8005ec6c 80490000 8001b1a8 1100fc03 00000000
[  710.190000] 	  00000004 8048c4a0 000000a0 8001b1b0 8c94e220 00008018 dc124877 a0020044
[  710.190000] 	  ...
[  710.260000] Call Trace:
[  710.270000] [] __schedule+0x574/0x758
[  710.280000] [] r4k_wait_irqoff+0x0/0x20
[  710.290000] 
[  710.290000] rcu_sched kthread starved for 6016 jiffies! g13114 c13113 f0x0 s3 ->state=0x1
[  782.470000] INFO: rcu_sched detected stalls on CPUs/tasks:
[  782.470000] 	1-...: (0 ticks this GP) idle=12c/0/0 softirq=48179/48179 fqs=0 
[  782.470000] 	(detected by 0, t=6002 jiffies, g=13324, c=13323, q=1260)
[  782.470000] Task dump for CPU 1:
[  782.470000] swapper/1       R running      0     0      1 0x00100000
[  782.470000] Stack : 00000000 00000001 0000000a 00000000 00000000 00000001 804df2a4 80490000
[  782.470000] 	  8048c75c 00000001 00000001 8048c540 8048c724 80490000 00000000 800135e4
[  782.470000] 	  00000000 00000001 87c70000 87c71ec0 80490000 8005ec74 1100fc03 00000001
[  782.470000] 	  00000000 80490000 804df2a4 8005ec6c 80490000 8001b1a8 1100fc03 00000000
[  782.470000] 	  00000004 8048c4a0 000000a0 8001b1b0 8c94e220 00008018 dc124877 a0020044
[  782.470000] 	  ...
[  782.470000] Call Trace:
[  782.470000] [] __schedule+0x574/0x758
[  782.470000] [] r4k_wait_irqoff+0x0/0x20
[  782.470000] 
[  782.470000] rcu_sched kthread starved for 6002 jiffies! g13324 c13323 f0x0 s3 ->state=0x1
[  860.040000] INFO: rcu_sched detected stalls on CPUs/tasks:
[  860.050000] 	1-...: (0 ticks this GP) idle=5a8/0/0 softirq=48179/48179 fqs=0 
[  860.060000] 	(detected by 3, t=6004 jiffies, g=13501, c=13500, q=2389)
[  860.070000] Task dump for CPU 1:
[  860.080000] swapper/1       R running      0     0      1 0x00100000
[  860.090000] Stack : 00000000 00002cd1 00000000 777882c0 00000000 00000000 804df2a4 80490000
[  860.090000] 	  8048c75c 00000001 00000001 8048c540 8048c724 80490000 00000000 800135e4
[  860.090000] 	  00000000 00000001 87c70000 87c71ec0 80490000 8005ec74 1100fc03 00000001
[  860.090000] 	  00000000 80490000 804df2a4 8005ec6c 80490000 8001b1a8 1100fc03 00000000
[  860.090000] 	  00000004 8048c4a0 000000a0 8001b1b0 8c94e220 00008018 dc124877 a0020044
[  860.090000] 	  ...
[  860.160000] Call Trace:
[  860.170000] [] __schedule+0x574/0x758
[  860.180000] [] r4k_wait_irqoff+0x0/0x20
[  860.190000] 
[  860.190000] rcu_sched kthread starved for 6017 jiffies! g13501 c13500 f0x0 s3 ->state=0x1


I hope it contains useful information for tracking down this bug. If there is anything else I can supply or test in order to help the debugging process, please let me know.

More information can be found at the following URL:
https://bugs.lede-project.org/index.php?do=details&task_id=764



More information about the lede-bugs mailing list