[PATCH 2/2] irqchip/sifive-plic: Fix the interrupt was enabled accidentally issue.

Thomas Gleixner tglx at linutronix.de
Fri Oct 16 09:42:27 EDT 2020


On Thu, Oct 15 2020 at 22:23, Marc Zyngier wrote:
> On 2020-10-12 14:57, Greentime Hu wrote:
>> In commit 2ca0b460bbcb ("genirq/affinity: Make affinity setting if
>> activated opt-in"),

This commit sha does not exist in mainline. Referencing a random stable
kernel tree is useless.

>> it added irqd_affinity_on_activate() checking in the function
>> irq_set_affinity_deactivated() so it will return false here.
>> In that case, it will call irq_try_set_affinity() -> plic_irq_toggle()
>> which will enable the interrupt to cause the CPU hang.
>> @@ -183,10 +183,14 @@ static int plic_irqdomain_map(struct irq_domain
>> *d, unsigned int irq,
>>  			      irq_hw_number_t hwirq)
>>  {
>>  	struct plic_priv *priv = d->host_data;
>> +	struct irq_data *irqd;
>> 
>>  	irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
>>  			    handle_fasteoi_irq, NULL, NULL);
>>  	irq_set_noprobe(irq);
>> +	irqd = irq_get_irq_data(irq);
>> +	irqd_set_single_target(irqd);
>> +	irqd_set_affinity_on_activate(irqd);
>>  	irq_set_affinity(irq, &priv->lmask);
>>  	return 0;
>>  }
>
> How does this fix anything? The plic driver doesn't have an activate
> callback, so how does it make any difference? I get the feeling this
> papers over another issue.

The existence of the activate callback is not really the interesting
part. And of course the analysis is completely bogus.

The referenced commit is a follow up to:

baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly")

which was added on Jul 17 and the fix was added on Jul 24:

f0c7baca1800 ("genirq/affinity: Make affinity setting if activated opt-in")

The latter restored the original behaviour within 7 days, except for
x86. And the original behaviour was that the core did not care about the
activated state at all. It just invoked irqchip->irq_set_affinity()
unconditionally, which is still does except when the interrupt is marked
with irqd_set_affinity_on_activate().

So the plic code had this problem before that change already:

 	irq_set_noprobe(irq);
 	irq_set_affinity(irq, &priv->lmask);

Mapping happens way before anything else, so Hu is definitely barking up
the wrong tree.

What the patch works around is the irq_set_affinity() callback which is
completely bogus and still broken even after that duct tape which"
fixes" the boot fail.

irq_set_affinity() can be called on masked interrupts so this:

        plic_irq_toggle(&priv->lmask, d, 0);
        plic_irq_toggle(cpumask_of(cpu), d, 1);

will always unconditionally unmask the interrupt when affinity is
changed.

        plic_irq_toggle(cpumask_of(cpu), d, !irqd_irq_masked(d));

is all it needs to work everywhere. Obvious, right?

The changelog wants to be "...: Fix broken irq_set_affinity() callback"
and the Fixes: tag referencing the commit which introduced that
unconditional unmask. Oh well.

Thanks,

        tglx











More information about the linux-riscv mailing list