[BUG] irqchip: armada-370-xp: workqueue lockup
Steffen Trumtrar
s.trumtrar at pengutronix.de
Tue Sep 21 01:40:59 PDT 2021
Hi,
I noticed that after the patch
e52e73b7e9f7d08b8c2ef6fb1657105093e22a03
From: Valentin Schneider <valentin.schneider at arm.com>
Date: Mon, 9 Nov 2020 09:41:18 +0000
Subject: [PATCH] irqchip/armada-370-xp: Make IPIs use
handle_percpu_devid_irq()
As done for the Arm GIC irqchips, move IPIs to handle_percpu_devid_irq() as
handle_percpu_devid_fasteoi_ipi() isn't actually required.
Signed-off-by: Valentin Schneider <valentin.schneider at arm.com>
Signed-off-by: Marc Zyngier <maz at kernel.org>
Link: https://lore.kernel.org/r/20201109094121.29975-3-valentin.schneider@arm.com
---
drivers/irqchip/irq-armada-370-xp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
index d7eb2e93db8f..32938dfc0e46 100644
--- a/drivers/irqchip/irq-armada-370-xp.c
+++ b/drivers/irqchip/irq-armada-370-xp.c
@@ -382,7 +382,7 @@ static int armada_370_xp_ipi_alloc(struct irq_domain *d,
irq_set_percpu_devid(virq + i);
irq_domain_set_info(d, virq + i, i, &ipi_irqchip,
d->host_data,
- handle_percpu_devid_fasteoi_ipi,
+ handle_percpu_devid_irq,
NULL, NULL);
}
I get workqueue lockups on my Armada-XP based board.
When I run the following test on v5.15-rc2
stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 120s
I get a backtrace like this:
stress-ng: info: [7740] dispatching hogs: 8 cpu, 4 io, 2 vm, 4 fork
[ 1670.169087] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1670.169102] (detected by 0, t=5252 jiffies, g=50257, q=3369)
[ 1670.169112] rcu: All QSes seen, last rcu_preempt kthread activity 5252 (342543-337291), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 1670.169121] rcu: rcu_preempt kthread timer wakeup didn't happen for 5251 jiffies! g50257 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
[ 1670.169128] rcu: Possible timer handling issue on cpu=1 timer-softirq=20398
[ 1670.169132] rcu: rcu_preempt kthread starved for 5252 jiffies! g50257 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=1
[ 1670.169140] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 1670.169143] rcu: RCU grace-period kthread stack dump:
[ 1670.169146] task:rcu_preempt state:R stack: 0 pid: 13 ppid: 2 flags:0x00000000
[ 1670.169157] Backtrace:
[ 1670.169163] [<c0a19c20>] (__schedule) from [<c0a1a458>] (schedule+0x64/0x110)
[ 1670.169185] r10:00000001 r9:c190e000 r8:c137b690 r7:c137b69c r6:c190fed4 r5:c190e000
[ 1670.169189] r4:c197c880
[ 1670.169192] [<c0a1a3f4>] (schedule) from [<c0a20048>] (schedule_timeout+0xa8/0x1c0)
[ 1670.169206] r5:c1303d00 r4:0005258c
[ 1670.169209] [<c0a1ffa0>] (schedule_timeout) from [<c01a1664>] (rcu_gp_fqs_loop+0x120/0x3ac)
[ 1670.169227] r7:c137b69c r6:c1303d00 r5:c137b4c0 r4:00000000
[ 1670.169230] [<c01a1544>] (rcu_gp_fqs_loop) from [<c01a3dac>] (rcu_gp_kthread+0xfc/0x1b0)
[ 1670.169247] r10:c190ff5c r9:c1303d00 r8:c137b4c0 r7:c190e000 r6:c137b69e r5:c137b690
[ 1670.169251] r4:c137b69c
[ 1670.169253] [<c01a3cb0>] (rcu_gp_kthread) from [<c0153b14>] (kthread+0x16c/0x1a0)
[ 1670.169268] r7:00000000
[ 1670.169271] [<c01539a8>] (kthread) from [<c01000fc>] (ret_from_fork+0x14/0x38)
[ 1670.169282] Exception stack(0xc190ffb0 to 0xc190fff8)
[ 1670.169288] ffa0: ???????? ???????? ???????? ????????
[ 1670.169293] ffc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
[ 1670.169297] ffe0: ???????? ???????? ???????? ???????? ???????? ????????
[ 1670.169305] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c01539a8
[ 1670.169310] r4:c19320c0 r3:00000000
[ 1670.169313] rcu: Stack dump where RCU GP kthread last ran:
[ 1670.169316] Sending NMI from CPU 0 to CPUs 1:
[ 1670.169327] NMI backtrace for cpu 1
[ 1670.169335] CPU: 1 PID: 7764 Comm: stress-ng-cpu Tainted: G W 5.15.0-rc2+ #5
[ 1670.169343] Hardware name: Marvell Armada 370/XP (Device Tree)
[ 1670.169346] PC is at 0x4bde7a
[ 1670.169354] LR is at 0x4bdf21
[ 1670.169359] pc : [<004bde7a>] lr : [<004bdf21>] psr: 20030030
[ 1670.169363] sp : beb8270c ip : 00004650 fp : beb8289c
[ 1670.169367] r10: 00e5e800 r9 : 00514760 r8 : 0000036b
[ 1670.169371] r7 : beb828a8 r6 : 000001f7 r5 : 000001fd r4 : 000bacd7
[ 1670.169375] r3 : 004bde30 r2 : 0000000b r1 : 000001fd r0 : 0001bbd7
[ 1670.169380] Flags: nzCv IRQs on FIQs on Mode USER_32 ISA Thumb Segment user
[ 1670.169386] Control: 10c5387d Table: 0334806a DAC: 00000055
[ 1670.169389] CPU: 1 PID: 7764 Comm: stress-ng-cpu Tainted: G W 5.15.0-rc2+ #5
[ 1670.169395] Hardware name: Marvell Armada 370/XP (Device Tree)
[ 1670.169398] Backtrace:
[ 1670.169402] [<c0a0b758>] (dump_backtrace) from [<c0a0b9a4>] (show_stack+0x20/0x24)
[ 1670.169418] r7:c18db400 r6:c7875fb0 r5:60030193 r4:c1099c7c
[ 1670.169421] [<c0a0b984>] (show_stack) from [<c0a11988>] (dump_stack_lvl+0x48/0x54)
[ 1670.169433] [<c0a11940>] (dump_stack_lvl) from [<c0a119ac>] (dump_stack+0x18/0x1c)
[ 1670.169445] r5:00000001 r4:20030193
[ 1670.169447] [<c0a11994>] (dump_stack) from [<c0109984>] (show_regs+0x1c/0x20)
[ 1670.169461] [<c0109968>] (show_regs) from [<c05f6af8>] (nmi_cpu_backtrace+0xc0/0x10c)
[ 1670.169474] [<c05f6a38>] (nmi_cpu_backtrace) from [<c010ffa4>] (do_handle_IPI+0x54/0x3b8)
[ 1670.169489] r7:c18db400 r6:00000017 r5:00000001 r4:00000007
[ 1670.169491] [<c010ff50>] (do_handle_IPI) from [<c0110330>] (ipi_handler+0x28/0x30)
[ 1670.169505] r10:c7875f58 r9:c7875fb0 r8:c7875f30 r7:c18db400 r6:00000017 r5:c13ecadc
[ 1670.169509] r4:c18d9300 r3:00000010
[ 1670.169511] [<c0110308>] (ipi_handler) from [<c0193200>] (handle_percpu_devid_irq+0xb4/0x288)
[ 1670.169525] [<c019314c>] (handle_percpu_devid_irq) from [<c018c4b4>] (handle_domain_irq+0x8c/0xc0)
[ 1670.169539] r9:c7875fb0 r8:00000007 r7:00000000 r6:c1863d80 r5:00000000 r4:c12781e0
[ 1670.169542] [<c018c428>] (handle_domain_irq) from [<c01012cc>] (armada_370_xp_handle_irq+0xdc/0x124)
[ 1670.169556] r10:00e5e800 r9:00514760 r8:10c5387d r7:c147d604 r6:c7875fb0 r5:000003fe
[ 1670.169560] r4:00000007 r3:00000007
[ 1670.169562] [<c01011f0>] (armada_370_xp_handle_irq) from [<c0100e58>] (__irq_usr+0x58/0x80)
[ 1670.169571] Exception stack(0xc7875fb0 to 0xc7875ff8)
[ 1670.169576] 5fa0: ???????? ???????? ???????? ????????
[ 1670.169580] 5fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
[ 1670.169584] 5fe0: ???????? ???????? ???????? ???????? ???????? ????????
[ 1670.169590] r7:10c5387d r6:ffffffff r5:20030030 r4:004bde7a
[ 1690.589098] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 38s!
[ 1690.589133] Showing busy workqueues and worker pools:
[ 1690.589138] workqueue events_unbound: flags=0x2
[ 1690.589142] pwq 4: cpus=0-1 flags=0x4 nice=0 active=3/512 refcnt=5
[ 1690.589157] in-flight: 7:call_usermodehelper_exec_work
[ 1690.589177] pending: flush_memcg_stats_work, flush_memcg_stats_dwork
[ 1690.589198] workqueue events_power_efficient: flags=0x80
[ 1690.589203] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=5/256 refcnt=6
[ 1690.589218] in-flight: 53:fb_flashcursor fb_flashcursor
[ 1690.589236] pending: neigh_periodic_work, neigh_periodic_work, do_cache_clean
[ 1690.589265] workqueue mm_percpu_wq: flags=0x8
[ 1690.589269] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 1690.589284] pending: vmstat_update
[ 1690.589301] workqueue edac-poller: flags=0xa000a
[ 1690.589305] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/1 refcnt=4
[ 1690.589318] pending: edac_mc_workq_function
[ 1690.589331] inactive: edac_device_workq_function
[ 1690.589346] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=38s workers=3 idle: 7621 6478
[ 1690.589370] pool 4: cpus=0-1 flags=0x4 nice=0 hung=41s workers=3 idle: 6967 5672
[ 1721.313097] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 69s!
[ 1721.313136] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 72s!
[ 1721.313149] Showing busy workqueues and worker pools:
[ 1721.313154] workqueue events_unbound: flags=0x2
[ 1721.313158] pwq 4: cpus=0-1 flags=0x4 nice=0 active=3/512 refcnt=5
[ 1721.313173] in-flight: 7:call_usermodehelper_exec_work
[ 1721.313193] pending: flush_memcg_stats_work, flush_memcg_stats_dwork
[ 1721.313213] workqueue events_power_efficient: flags=0x80
[ 1721.313218] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=5/256 refcnt=6
[ 1721.313234] in-flight: 53:fb_flashcursor fb_flashcursor
[ 1721.313251] pending: neigh_periodic_work, neigh_periodic_work, do_cache_clean
[ 1721.313282] workqueue mm_percpu_wq: flags=0x8
[ 1721.313285] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 1721.313301] pending: vmstat_update
[ 1721.313319] workqueue edac-poller: flags=0xa000a
[ 1721.313323] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/1 refcnt=4
[ 1721.313336] pending: edac_mc_workq_function
[ 1721.313349] inactive: edac_device_workq_function
[ 1721.313366] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=69s workers=3 idle: 7621 6478
[ 1721.313390] pool 4: cpus=0-1 flags=0x4 nice=0 hung=72s workers=3 idle: 6967 5672
[ 1733.189086] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1733.189101] (detected by 0, t=21007 jiffies, g=50257, q=13112)
[ 1733.189111] rcu: All QSes seen, last rcu_preempt kthread activity 21007 (358298-337291), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 1733.189119] rcu: rcu_preempt kthread timer wakeup didn't happen for 21006 jiffies! g50257 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
[ 1733.189126] rcu: Possible timer handling issue on cpu=1 timer-softirq=20834
[ 1733.189131] rcu: rcu_preempt kthread starved for 21007 jiffies! g50257 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=1
[ 1733.189138] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[ 1733.189141] rcu: RCU grace-period kthread stack dump:
[ 1733.189144] task:rcu_preempt state:R stack: 0 pid: 13 ppid: 2 flags:0x00000000
[ 1733.189156] Backtrace:
[ 1733.189162] [<c0a19c20>] (__schedule) from [<c0a1a458>] (schedule+0x64/0x110)
[ 1733.189184] r10:00000001 r9:c190e000 r8:c137b690 r7:c137b69c r6:c190fed4 r5:c190e000
[ 1733.189188] r4:c197c880
[ 1733.189191] [<c0a1a3f4>] (schedule) from [<c0a20048>] (schedule_timeout+0xa8/0x1c0)
[ 1733.189205] r5:c1303d00 r4:0005258c
[ 1733.189208] [<c0a1ffa0>] (schedule_timeout) from [<c01a1664>] (rcu_gp_fqs_loop+0x120/0x3ac)
[ 1733.189226] r7:c137b69c r6:c1303d00 r5:c137b4c0 r4:00000000
[ 1733.189229] [<c01a1544>] (rcu_gp_fqs_loop) from [<c01a3dac>] (rcu_gp_kthread+0xfc/0x1b0)
[ 1733.189246] r10:c190ff5c r9:c1303d00 r8:c137b4c0 r7:c190e000 r6:c137b69e r5:c137b690
[ 1733.189249] r4:c137b69c
[ 1733.189252] [<c01a3cb0>] (rcu_gp_kthread) from [<c0153b14>] (kthread+0x16c/0x1a0)
[ 1733.189267] r7:00000000
[ 1733.189270] [<c01539a8>] (kthread) from [<c01000fc>] (ret_from_fork+0x14/0x38)
[ 1733.189281] Exception stack(0xc190ffb0 to 0xc190fff8)
[ 1733.189287] ffa0: ???????? ???????? ???????? ????????
[ 1733.189292] ffc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
[ 1733.189297] ffe0: ???????? ???????? ???????? ???????? ???????? ????????
[ 1733.189304] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c01539a8
[ 1733.189309] r4:c19320c0 r3:00000000
[ 1733.189312] rcu: Stack dump where RCU GP kthread last ran:
[ 1733.189315] Sending NMI from CPU 0 to CPUs 1:
[ 1733.189327] NMI backtrace for cpu 1
[ 1733.189335] CPU: 1 PID: 7755 Comm: stress-ng-cpu Tainted: G W 5.15.0-rc2+ #5
[ 1733.189343] Hardware name: Marvell Armada 370/XP (Device Tree)
[ 1733.189346] PC is at 0x4bdee0
[ 1733.189354] LR is at 0x4bdf21
[ 1733.189358] pc : [<004bdee0>] lr : [<004bdf21>] psr: 20030030
[ 1733.189363] sp : beb8270c ip : 00004650 fp : beb8289c
[ 1733.189367] r10: 00e5e800 r9 : 00514760 r8 : 00000358
[ 1733.189370] r7 : beb828a8 r6 : 00000047 r5 : 0000004d r4 : 000b2ab7
[ 1733.189375] r3 : 004bde10 r2 : 00001217 r1 : 0000004f r0 : 00000085
[ 1733.189379] Flags: nzCv IRQs on FIQs on Mode USER_32 ISA Thumb Segment user
[ 1733.189385] Control: 10c5387d Table: 0734006a DAC: 00000055
[ 1733.189389] CPU: 1 PID: 7755 Comm: stress-ng-cpu Tainted: G W 5.15.0-rc2+ #5
[ 1733.189395] Hardware name: Marvell Armada 370/XP (Device Tree)
[ 1733.189397] Backtrace:
[ 1733.189402] [<c0a0b758>] (dump_backtrace) from [<c0a0b9a4>] (show_stack+0x20/0x24)
[ 1733.189417] r7:c18db400 r6:c7375fb0 r5:60030193 r4:c1099c7c
[ 1733.189420] [<c0a0b984>] (show_stack) from [<c0a11988>] (dump_stack_lvl+0x48/0x54)
[ 1733.189432] [<c0a11940>] (dump_stack_lvl) from [<c0a119ac>] (dump_stack+0x18/0x1c)
[ 1733.189444] r5:00000001 r4:20030193
[ 1733.189446] [<c0a11994>] (dump_stack) from [<c0109984>] (show_regs+0x1c/0x20)
[ 1733.189460] [<c0109968>] (show_regs) from [<c05f6af8>] (nmi_cpu_backtrace+0xc0/0x10c)
[ 1733.189473] [<c05f6a38>] (nmi_cpu_backtrace) from [<c010ffa4>] (do_handle_IPI+0x54/0x3b8)
[ 1733.189488] r7:c18db400 r6:00000017 r5:00000001 r4:00000007
[ 1733.189490] [<c010ff50>] (do_handle_IPI) from [<c0110330>] (ipi_handler+0x28/0x30)
[ 1733.189504] r10:c7375f58 r9:c7375fb0 r8:c7375f30 r7:c18db400 r6:00000017 r5:c13ecadc
[ 1733.189508] r4:c18d9300 r3:00000010
[ 1733.189510] [<c0110308>] (ipi_handler) from [<c0193200>] (handle_percpu_devid_irq+0xb4/0x288)
[ 1733.189523] [<c019314c>] (handle_percpu_devid_irq) from [<c018c4b4>] (handle_domain_irq+0x8c/0xc0)
[ 1733.189538] r9:c7375fb0 r8:00000007 r7:00000000 r6:c1863d80 r5:00000000 r4:c12781e0
[ 1733.189540] [<c018c428>] (handle_domain_irq) from [<c01012cc>] (armada_370_xp_handle_irq+0xdc/0x124)
[ 1733.189555] r10:00e5e800 r9:00514760 r8:10c5387d r7:c147d604 r6:c7375fb0 r5:000003fe
[ 1733.189559] r4:00000007 r3:00000007
[ 1733.189561] [<c01011f0>] (armada_370_xp_handle_irq) from [<c0100e58>] (__irq_usr+0x58/0x80)
[ 1733.189570] Exception stack(0xc7375fb0 to 0xc7375ff8)
[ 1733.189575] 5fa0: ???????? ???????? ???????? ????????
[ 1733.189579] 5fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
[ 1733.189583] 5fe0: ???????? ???????? ???????? ???????? ???????? ????????
[ 1733.189589] r7:10c5387d r6:ffffffff r5:20030030 r4:004bdee0
[ 1752.029102] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 100s!
[ 1752.029137] Showing busy workqueues and worker pools:
[ 1752.029141] workqueue events_unbound: flags=0x2
[ 1752.029146] pwq 4: cpus=0-1 flags=0x4 nice=0 active=3/512 refcnt=5
[ 1752.029161] in-flight: 7:call_usermodehelper_exec_work
[ 1752.029180] pending: flush_memcg_stats_work, flush_memcg_stats_dwork
[ 1752.029200] workqueue events_power_efficient: flags=0x80
[ 1752.029205] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=5/256 refcnt=6
[ 1752.029221] in-flight: 53:fb_flashcursor fb_flashcursor
[ 1752.029239] pending: neigh_periodic_work, neigh_periodic_work, do_cache_clean
[ 1752.029269] workqueue mm_percpu_wq: flags=0x8
[ 1752.029272] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 1752.029288] pending: vmstat_update
[ 1752.029306] workqueue edac-poller: flags=0xa000a
[ 1752.029310] pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/1 refcnt=4
[ 1752.029323] pending: edac_mc_workq_function
[ 1752.029337] inactive: edac_device_workq_function
[ 1752.029353] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=100s workers=3 idle: 7621 6478
[ 1752.029378] pool 4: cpus=0-1 flags=0x4 nice=0 hung=102s workers=3 idle: 6967 5672
stress-ng: info: [7740] successful run completed in 125.31s (2 mins, 5.31 secs)
Earlier kernels (i.e v5.13.9) completely froze the machine resulting in
the watchdog triggering and rebooting the machine. So, $something was
already fixed here.
Bisecting leads to the mentioned commit, reverting of the commit results
in a BUG-less run of the stress-ng test.
Any idea what might cause this and how to fix it?
Best regards,
Steffen Trumtrar
--
Pengutronix e.K. | Dipl.-Inform. Steffen Trumtrar |
Steuerwalder Str. 21 | https://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686| Fax: +49-5121-206917-5555 |
More information about the linux-arm-kernel
mailing list