[RFC PATCH] irqchip/gic-v3: wait irq done to set affinity

Yipeng Zou zouyipeng at huawei.com
Mon Jan 9 04:26:35 PST 2023


在 2023/1/6 19:55, Marc Zyngier 写道:
> On Fri, 06 Jan 2023 08:21:36 +0000,
> Yipeng Zou <zouyipeng at huawei.com> wrote:
>> Recently we have some problem about gic set affinity in our test.
>>
>> This patch just aim to make some discuss about this problem.
>>
>> For now, the implementation of gic set affinity going to take effects
>> immediately, and without check if any irq are being processed.
>>
>> So, This leads to some problem, think about this scenario:
>>
>> 1. First, we have an irq was generated by an device.
>>
>> 2. In the processing of this irq(after handle event, before clear
>> IRQD_IRQ_INPROGRESS flag), we modify the route and the gic takes effect
>> immediately,at the same time the new one was generated again.
> How is that possible?
>
> If it is affected by GICD_IROUTERn (as your patch suggests), then it
> is a SPI. If it is a SPI, it has an active state. Which means it
> cannot fire again without a deactivation (EOI if EOImode=0, EOI+DIR if
> EOImode=1) having taken place.
>
> So either something has deactivated the interrupt without masking it
> beforehand, or the active state is not honoured. Either way, this is
> wrong.
Yes, agree, There is no possible in SPI case.
>> 3. The new irq will be processing in other cpu which different form the
>> old one.
>>
>> 4. The new irq going to be discarded because of the flag IRQD_IRQ_INPROGRESS
>> has been set.
>>
>> I notice that if we set IRQF_ONESHOT when register the irq, this problem
>> will gone.
>>
>> But I'm also thinking about change the gic_set_affinity function, to wait
>> current irq done on all cpus before gic_write_irouter.
>> I'm not sure if that's appropriate.
> The base architecture should guarantee that this is not a problem,
> thanks to the active state. If that was a LPI (which do not have an
> active state), that'd be a different problem. But this doesn't seem to
> be the case here.

Hi , Thanks for reply very much.

I have rechecked our test. Actually, that was a LPI in out test case.

It cause the problem since its_send_movi command.

I made a mistake when i modified the code.  It should be as follow. 
Sorry for misleading you.


diff --git a/drivers/irqchip/irq-gic-v3-its.c 
b/drivers/irqchip/irq-gic-v3-its.c

index 973ede0197e3..fad08ccb7fd9 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1667,6 +1667,9 @@ static int its_set_affinity(struct irq_data *d, 
const struct cpumask *mask_val,

         /* don't set the affinity when the target cpu is same as 
current one */
         if (cpu != prev_cpu) {
+
+               // wait irq done on all cpus
+
                 target_col = &its_dev->its->collections[cpu];
                 its_send_movi(its_dev, target_col, id);
                 its_dev->event_map.col_map[id] = cpu

> I'm afraid to say that what you describe seem like a bug of some sort,
> either HW or SW.
>
> Thanks,
>
> 	M.

-- 
Regards,
Yipeng Zou




More information about the linux-arm-kernel mailing list