A problem about SPI Interrupt Configuration

Thu Jan 21 23:57:41 PST 2016

On Wed, 20 Jan 2016 10:38:13 +0800
Yang Yingliang <yangyingliang at huawei.com> wrote:

Hi Yang,

> Hi, Marc
> 
> I got some error messages "RWP timeout, gone fishing".
> 
> The case is :
> 
> 			CPU0												CPU1
> 													acquire desc->lock in __setup_irq()
> 															enable irq in __setup_irq()
> read iar in gic_handle_irq()
> waiting for desc->lock in handle_fasteoi_irq()
> 												call gic_set_affinity() from setup_affinity()
> 												waiting for the irq deactive in gic_do_wait_for_rwp()
> 
> 
> The hardware will not clear GICD_CTLR.RWP until the interrupt is not
> active. The interrupt is keeping active while it's waiting for
> desc->lock on cpu0. But the lock is hold by cpu1 while it's waiting for
> the interrupt is not active. It causes a deadlock here in 1s.
> 
> 
> And the GICv3 SPEC says:
> 
> 4.5.5 SPI Interrupt Configuration
> To configure an SPI interrupt, to ensure that interrupts are never 
> distributed using partially updated configuration
> information, software must:
> o Ensure the interrupt is not active
> o Ensure that the interrupt is disabled
> o This might be done either by writing to GICD_CTLR to clear the enables 
> for a group, or
> o By writing to GICD_ICENABLERn to clear the Enable bit of the interrupt 
> (see section 5.3.11).
> o In both cases, software must poll GICD_CTLR.RWP to ensure the effects 
> are visible (see section
> 5.3.20).
> o Program the routing (if appropriate), priority and group
> o Enable the interrupt (if required)
> 
> Because it says "Ensure the interrupt is not active", so I can not tell
> it is a hardware or software problem.
> 
> Can you please give some advice?

Thanks for the accurate description of the problem. This looks to be a
core issue, or at least a problem between core code and the way the GIC
behaves, unfortunately. The architecture expects the interrupt to be
fully configured before it is enabled and made active, while the core
code does this the other way around.

Can you please have a go at the patch below and let me know if it
improve things? This is just a test, and definitely not the complete
solution, but I'd like to find out if I'm on the right track.

Thanks,

	M.

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0eebaee..e5802fb 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1303,12 +1303,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 		if (new->flags & IRQF_ONESHOT)
 			desc->istate |= IRQS_ONESHOT;
 
-		if (irq_settings_can_autoenable(desc))
-			irq_startup(desc, true);
-		else
-			/* Undo nested disables: */
-			desc->depth = 1;
-
 		/* Exclude IRQ from balancing if requested */
 		if (new->flags & IRQF_NOBALANCING) {
 			irq_settings_set_no_balancing(desc);
@@ -1318,6 +1312,12 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 		/* Set default affinity mask once everything is setup */
 		setup_affinity(desc, mask);
 
+		if (irq_settings_can_autoenable(desc))
+			irq_startup(desc, true);
+		else
+			/* Undo nested disables: */
+			desc->depth = 1;
+
 	} else if (new->flags & IRQF_TRIGGER_MASK) {
 		unsigned int nmsk = new->flags & IRQF_TRIGGER_MASK;
 		unsigned int omsk = irq_settings_get_trigger_mask(desc);

-- 
Without deviation from the norm, progress is not possible.