[FS#1170] mt7621: kernel errors - rcu_sched detected stalls on CPUs/tasks - again

LEDE Bugs lede-bugs at lists.infradead.org
Wed Nov 15 04:56:31 PST 2017


A new Flyspray task has been opened.  Details are below. 

User who did this - Kristian Evensen (kristrev) 

Attached to Project - LEDE Project
Summary - mt7621: kernel errors - rcu_sched detected stalls on CPUs/tasks - again
Task Type - Bug Report
Category - Kernel
Status - Unconfirmed
Assigned To - 
Operating System - All
Severity - Critical
Priority - Very Low
Reported Version - All
Due in Version - Undecided
Due Date - Undecided
Details - After the work done for issue FS#804, the rcu_sched error seemed to be gone. However, I am now starting to see it again. Usually, at least for me, the error happens when there is large amounts of traffic and I do something with the network. My most reliable way for reproducing the error is as follows:

- Use iperf to flood a router with small packets. Other ways to stress the CPU also work, I for example triggered the error when I added very aggressive logging to the firewall.
- While the router is being flooded, I restarted networking (I am logged in to the router via UART).
- After a couple of network restarts, the error is trigger and the following is written to syslog at some interval:

[ 2251.870000] INFO: rcu_bh detected stalls on CPUs/tasks:
[ 2251.870000] 	2-...: (1 GPs behind) idle=ae1/140000000000001/0 softirq=212487/217796 fqs=4380 
[ 2251.870000] 	(detected by 1, t=6002 jiffies, g=-146, c=-147, q=4)
[ 2251.870000] Task dump for CPU 2:
[ 2251.870000] openvpn-mover.s R running      0  2598      1 0x08100004
[ 2251.870000] Stack : 8fa69998 800ebe38 00000000 8fa69998 57512e2b 000001fd 00000000 80035454
[ 2251.870000] 	  00000000 800edbd4 8fa69998 804b0000 00000000 00000000 00000004 00000000
[ 2251.870000] 	  00000000 8ea17850 8efc7ec0 800376d4 00000000 00000000 778b8930 00000012
[ 2251.870000] 	  00000000 004077cd 778d4000 00000000 778d55e8 778d6f7c 00000000 8002b280
[ 2251.870000] 	  ffbffeff ffffffff 00617772 706d742f 00000000 00000000 00000001 800379dc
[ 2251.870000] 	  ...
[ 2251.870000] Call Trace:
[ 2251.870000] [] __schedule+0x574/0x758

I am also able to sometimes trigger the issue by simply issuing the reboot-command (while the CPU is stressed). I have not applied any traffic shaping to my interface, and I see the error both with kernel 4.4 and 4.9 (i.e., LEDE 17.01 and master). I don't quite know how to progress in debugging this.
   

 

More information can be found at the following URL:
https://bugs.lede-project.org/index.php?do=details&task_id=1170



More information about the lede-bugs mailing list