[PATCH v2 1/2] irqchip/gic-v4.1: Fix GICv4.1 doorbell affinity

Kunkun Jiang jiangkunkun at huawei.com
Fri Jan 26 18:41:55 PST 2024


Hi Marc,

On 2024/1/26 18:52, Marc Zyngier wrote:
> On Fri, 26 Jan 2024 10:30:11 +0000,
> Kunkun Jiang <jiangkunkun at huawei.com> wrote:
>> dd3f050a216e make an optimization, VMOVP can be skipped if moving
>> VPE to a cpu whose RD is sharing its VPE table with the current one.
>> But when skipping VMOVP, the affinity recorded in irq_data is still
>> updated. This causes the doorbell affinity recorfed in the irq_data
>> to be inconsistent with the actual.
>>
>> In corner case, this may result in lost interrupts:
>> 0. Each cpu die shares a VPE table and contains 32 CPUs
>>     die0(CPU0-31) die1(CPU32-63)...
>> 1. VPE resides on CPU32, doorbell affinity to CPU32.
>> 2. Move VPE to CPU33, skip VMOVP, doorbell still affinity to CPU32.
>>     The affinity recorded in irq_data is CPU33.
>> 3. Manually offline CPU32 on the host side:
>>     'echo 0 > /sys/devices/system/cpu/cpu32/online'
>> 4. Core code cannot move the doorbell affinity to CPU32, since the
>>     record in irq_data is CPU33.
>> 5. Subsequent doorbell interrupts will be lost.
>>
>> So affinity recoreded in irq_data should not be updated when skipping
>> VMOVP.
>>
>> Fixes: dd3f050a216e (irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP)
>> Signed-off-by: Kunkun Jiang <jiangkunkun at huawei.com>
>> ---
>>   drivers/irqchip/irq-gic-v3-its.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index d097001c1e3e..4b1dbb697959 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -3850,8 +3850,9 @@ static int its_vpe_set_affinity(struct irq_data *d,
>>   	its_send_vmovp(vpe);
>>   	its_vpe_db_proxy_move(vpe, from, cpu);
>>   
>> -out:
>>   	irq_data_update_effective_affinity(d, cpumask_of(cpu));
>> +
>> +out:
>>   	vpe_to_cpuid_unlock(vpe, flags);
>>   
>>   	return IRQ_SET_MASK_OK_DONE;
> This is becoming *very* annoying. I've already told you this patch was
> wrong, yet you keep sending it (twice in just over an hour).
Sorry, I didn't see that you had replied to me, when I sent it for
the second time.I have no intention of doing this.Sincerely apologize
for you.

Please allow me to put your previous reply here for discussion:
> That looks wrong. You are lying to the core code by saying that it's
> all OK, and yet haven't done*anything*. This stuff is obviously
> buggy, but I don't think this is right.
Make scene. I understand that the root cause of this problem is that
there is a difference between the doorbell affinity recorded in the
kernel and the actual hardware. So I'm just thinking about keeping
both consistent.
> In your example, you don't even solve the problem: if CPUs 32 and 33
> are part of the same ITS affinity group, you won't issue a VMOVP
> either, so this doesn't fix anything.
No VMOVP is sent. But the affinity recorded in irq_data is correct.
So when offline CPU32, core code can move doorbell to CPU0.The
mask_val is the entire cpu range, except for the offline cpu.
> At this stage, I think the VMOVP optimisation is wrong and that we
> should drop it.
This is the last resort. Maybe we can think of another way.

Kunkun Jiang




More information about the linux-arm-kernel mailing list