[BUG] irqchip: armada-370-xp: workqueue lockup

Marc Zyngier maz at kernel.org
Tue Sep 21 08:18:57 PDT 2021


Hi Steffen,

On Tue, 21 Sep 2021 09:40:59 +0100,
Steffen Trumtrar <s.trumtrar at pengutronix.de> wrote:
> 
> 
> Hi,
> 
> I noticed that after the patch
> 
>         e52e73b7e9f7d08b8c2ef6fb1657105093e22a03
>         From: Valentin Schneider <valentin.schneider at arm.com>
>         Date: Mon, 9 Nov 2020 09:41:18 +0000
>         Subject: [PATCH] irqchip/armada-370-xp: Make IPIs use
>         handle_percpu_devid_irq()
> 
>         As done for the Arm GIC irqchips, move IPIs to handle_percpu_devid_irq() as
>         handle_percpu_devid_fasteoi_ipi() isn't actually required.
> 
>         Signed-off-by: Valentin Schneider <valentin.schneider at arm.com>
>         Signed-off-by: Marc Zyngier <maz at kernel.org>
>         Link: https://lore.kernel.org/r/20201109094121.29975-3-valentin.schneider@arm.com
>         ---
>         drivers/irqchip/irq-armada-370-xp.c | 2 +-
>         1 file changed, 1 insertion(+), 1 deletion(-)
> 
>         diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
>         index d7eb2e93db8f..32938dfc0e46 100644
>         --- a/drivers/irqchip/irq-armada-370-xp.c
>         +++ b/drivers/irqchip/irq-armada-370-xp.c
>         @@ -382,7 +382,7 @@ static int armada_370_xp_ipi_alloc(struct irq_domain *d,
>                         irq_set_percpu_devid(virq + i);
>                         irq_domain_set_info(d, virq + i, i, &ipi_irqchip,
>                                         d->host_data,
>         -                                   handle_percpu_devid_fasteoi_ipi,
>         +                                   handle_percpu_devid_irq,
>                                         NULL, NULL);
>                 }
> 
> I get workqueue lockups on my Armada-XP based board.
> When I run the following test on v5.15-rc2
> 
>         stress-ng --cpu 8 --io 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 120s
>
> I get a backtrace like this:

[...]

> Earlier kernels (i.e v5.13.9) completely froze the machine resulting in
> the watchdog triggering and rebooting the machine. So, $something was
> already fixed here.

Fixed? Or broken? More likely the later.

> Bisecting leads to the mentioned commit, reverting of the commit results
> in a BUG-less run of the stress-ng test.
> Any idea what might cause this and how to fix it?

It isn't obvious to me how reverting this patch fixes anything.  The
fasteoi flow does the same thing as far as the IPI driver is concerned

However, it appears that I have broken that part much earlier in
f02147dd02eb ("irqchip/armada-370-xp: Configure IPIs as standard
interrupts"), as the write to ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS that
used to occur before the handling (an ACK) has now been moved after as
an EOI. That's a pretty good way to lose edge interrupts.

Could you try the following patch on top of 5.12-rc2?

Thanks,

	M.

diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
index 7557ab551295..53e0fb0562c1 100644
--- a/drivers/irqchip/irq-armada-370-xp.c
+++ b/drivers/irqchip/irq-armada-370-xp.c
@@ -359,16 +359,16 @@ static void armada_370_xp_ipi_send_mask(struct irq_data *d,
 		ARMADA_370_XP_SW_TRIG_INT_OFFS);
 }
 
-static void armada_370_xp_ipi_eoi(struct irq_data *d)
+static void armada_370_xp_ipi_ack(struct irq_data *d)
 {
 	writel(~BIT(d->hwirq), per_cpu_int_base + ARMADA_370_XP_IN_DRBEL_CAUSE_OFFS);
 }
 
 static struct irq_chip ipi_irqchip = {
 	.name		= "IPI",
+	.irq_ack	= armada_370_xp_ipi_ack,
 	.irq_mask	= armada_370_xp_ipi_mask,
 	.irq_unmask	= armada_370_xp_ipi_unmask,
-	.irq_eoi	= armada_370_xp_ipi_eoi,
 	.ipi_send_mask	= armada_370_xp_ipi_send_mask,
 };
 

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list