[LEDE-DEV] [PATCH v2] ramips: add support for Ubiquiti EdgeRouter X-SFP

p.wassi at gmx.at p.wassi at gmx.at
Thu Jun 15 02:25:18 PDT 2017


> If you're crashing the box my guess would be there's a bug in the cake
> qdisc somewhere. What happens if you run SQM with fq_codel instead?

I switched over to fq_codel + simple.qos two days ago.
First, the 'frequently' appearing errors were gone - all seemed fine.
However, after ~30h and ~4GB of traffic, I got one single event again:

> [140116.830000] INFO: rcu_sched self-detected stall on CPU
> [140116.830000] INFO: rcu_sched detected stalls on CPUs/tasks:
> [140116.830000] 	0-...: (1 GPs behind) idle=101/2/0 softirq=766722/766730 fqs=0 
> [140116.830000] 	
> [140116.830000] (detected by 1, t=14267 jiffies, g=285924, c=285923, q=448)
> [140116.830000] Task dump for CPU 0:
> [140116.830000] swapper/0       R
> [140116.830000]   running task        0     0      0 0x00100000
> Stack :
> [140116.830000]  804affe0 00000400 00000000 7ac136ec 00007f6e ffffffff 00007009 771202c0       
> [140116.830000]  804bb48c 00000001 8045bca0 804c0000 00000001 8ffc39dc 00000000 00000000       
> [140116.830000]  00000000 8000c1cc 11000403 00000003 804ae000 804afea8 bfbf0000 80062e74       
> [140116.830000]  11000403 00000003 00000001 804c0000 d0800400 8000c1e4 80520000 804c0000       
> [140116.830000]  80520000 803c5fcc 80520000 804c0000 80520000 80505ce4 80520000 804dfbe4       
> [140116.830000]  ...Call Trace:
> [140116.830000] [<803c7c98>] __schedule+0x5d4/0x7a4
> [140116.830000] [<8000c1cc>] r4k_wait_irqoff+0x0/0x20
> [140116.830000] rcu_sched kthread starved for 14267 jiffies! g285924 c285923 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
> [140116.830000] rcu_sched       S
> [140116.830000]     0     7      2 0x00100000
> Stack :
> [140116.830000]  804bb5f4 8fc52340 81235bc0 00000000 81235bc0 00000000 81235bc0 8050fbc0       
> [140116.830000]  8121c320 00d52039 8121c320 8fc6be50 804c0000 00000001 804c0000 804c0000       
> [140116.830000]  804c35b0 803c7ed4 00d52039 804c0000 8fc6be50 8121c320 00d52039 803ca838       
> [140116.830000]  804bb5f4 00000001 804c3480 804c35b0 804c0000 00000001 00000000 8121c460       
> [140116.830000]  00d52039 8007b964 8fc52340 0e800001 804c3480 00000001 804c0000 00000000       
> [140116.830000]  ...Call Trace:
> [140116.830000] [<803c7c98>] __schedule+0x5d4/0x7a4
> [140116.830000] [<803c7ed4>] schedule+0x6c/0x84
> [140116.830000] [<803ca838>] schedule_timeout+0x160/0x19c
> [140116.830000] [<80078ea0>] rcu_gp_kthread+0x7f4/0x7fc
> [140116.830000] [<80044b98>] kthread+0xd8/0xec
> [140116.830000] [<8000a318>] ret_from_kernel_thread+0x14/0x1c
> [140116.830000] 	0-...: (1 GPs behind) idle=101/2/0 softirq=766722/766730 fqs=1 
> [140116.830000] 	 (t=14267 jiffies g=285924 c=285923 q=448)
> [140116.830000] Task dump for CPU 0:
> [140116.830000] swapper/0       R  running task        0     0      0 0x00100004
> [140116.830000] Stack : 00000000 800694d0 00000000 00000000 00000000 800694d0 0000001d 00000006
> [140116.830000]         00000006 804c0000 00000000 00000000 00000000 00000000 00000000 80520000
> [140116.830000]         00000000 804bdea0 804bb490 804c0000 804bb490 00000000 00000000 00000000
> [140116.830000]         00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [140116.830000]         00000000 00000000 00000000 00000000 00000000 0004091f 00000000 804bdea0
> [140116.830000]         ...
> [140116.830000] Call Trace:
> [140116.830000] [<8000f640>] show_stack+0x50/0x84
> [140116.830000] [<800a40b0>] rcu_dump_cpu_stacks+0xdc/0x110
> [140116.830000] [<8007981c>] rcu_check_callbacks+0x2cc/0x7c4
> [140116.830000] [<8007bd60>] update_process_times+0x34/0x70
> [140116.830000] [<8008c6a8>] tick_sched_timer+0x238/0x2a0
> [140116.830000] [<8007cbec>] __hrtimer_run_queues+0x10c/0x1d4
> [140116.830000] [<8007ce3c>] hrtimer_interrupt+0xec/0x2ac
> [140116.830000] [<802afd5c>] gic_compare_interrupt+0x2c/0x40
> [140116.830000] [<8006fa90>] handle_percpu_devid_irq+0xc4/0x18c
> [140116.830000] [<8006a7bc>] generic_handle_irq+0x24/0x3c
> [140116.830000] [<802039d8>] gic_handle_local_int+0x94/0xd4
> [140116.830000] [<80203b94>] gic_irq_dispatch+0x10/0x20
> [140116.830000] [<8006a7bc>] generic_handle_irq+0x24/0x3c
> [140116.830000] [<8000c2c8>] do_IRQ+0x1c/0x34
> [140116.830000] [<80202c80>] plat_irq_dispatch+0xb4/0xdc
> [140116.830000] [<8000a820>] except_vec_vi_end+0xb4/0xc0


@Paul: yeah, FS#764 really seems to be related. Same CPU there.

>From what I've seen on the device here:
both, 4.4.71 and 4.9.31 are affected. With cake it happens rather frequently,
with fq_codel once every two days or so. I'll keep it running with
fq_codel and see when the next error will be triggered.

Regards,
P. Wassi

> 
> -Toke
> 



More information about the Lede-dev mailing list