[PATCH v2 1/2] irqchip/gic-v4.1: Fix GICv4.1 doorbell affinity

Tue Jan 30 17:29:40 PST 2024

Hi Marc,

On 2024/1/30 21:55, Marc Zyngier wrote:
> On Tue, 30 Jan 2024 13:32:24 +0000,
> Kunkun Jiang <jiangkunkun at huawei.com> wrote:
>> Hi Marc,
>>
>> On 2024/1/28 21:28, Marc Zyngier wrote:
>>> On Sat, 27 Jan 2024 02:41:55 +0000,
>>> Kunkun Jiang <jiangkunkun at huawei.com> wrote:
>>>>> At this stage, I think the VMOVP optimisation is wrong and that we
>>>>> should drop it.
>>>> This is the last resort. Maybe we can think of another way.
>>> I think the only other way is to never elide the VMOVP from
>>> set_affinity, but to only call into set_affinity when strictly
>>> necessary:
>>>
>>> - when making the VPE resident on a CPU that isn't part of the same
>>>     affinity group (VPE migration outside of the affinity group)
>>>
>>> - when making the VPE non-resident on a CPU that isn't the doorbell
>>>     delivery CPU *AND* that there is a need for a doorbell (VPE
>>>     migration inside the affinity group)
>>>
>>> Which means it is the responsibility of the upper layers of the stack
>>> to make these decisions, and that we can keep the low-level driver as
>>> simple as possible.
>>>
>>> In the meantime, here's the patch I'm proposing to merge ASAP. Please
>>> let me know if that works for you.
>> This patch works. But I still have some questions, which I will
>> write down below.
>>> 	M.
>>>
>>> >From 00c338e91d3b6600f402f56cdb64c9d370f911c8 Mon Sep 17 00:00:00 2001
>>> From: Marc Zyngier <maz at kernel.org>
>>> Date: Sun, 28 Jan 2024 13:05:24 +0000
>>> Subject: [PATCH] irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
>>>
>>> When updating the affinity of a VPE, we currently skip the VMOVP
>>> command if the two CPUs are part of the same VPE affinity.
>>>
>>> But this is wrong, as the doorbell corresponding to this VPE
>>> is still delivered on the 'old' CPU, which screws up the balancing.
>>> Furthermore, offlining that 'old' CPU results in doorbell interrupts
>>> generated for this VPE being discarded.
>>>
>>> The harsh reality is that we cannot easily elide VMOVP when
>>> a set_affinity request occurs. It needs to be obeyed, and if
>>> an optimisation is to be made, it is at the point where the affinity
>>> change request is made (such as in KVM).
>>>
>>> Drop the VMOVP elision altogether, and only use the vpe_table_mask
>>> to try and stay within the same ITS affinity group if at all possible.
>>>
>>> Fixes: dd3f050a216e (irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP)
>>> Reported-by: Kunkun Jiang <jiangkunkun at huawei.com>
>>> Signed-off-by: Marc Zyngier <maz at kernel.org>
>>> Cc: stable at vger.kernel.org
>>> ---
>>>    drivers/irqchip/irq-gic-v3-its.c | 22 +++++++++++++---------
>>>    1 file changed, 13 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>>> index 250b4562f308..78aece62bff5 100644
>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>> @@ -3826,8 +3826,9 @@ static int its_vpe_set_affinity(struct irq_data *d,
>>>    				bool force)
>>>    {
>>>    	struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
>>> -	int from, cpu = cpumask_first(mask_val);
>>> +	struct cpumask common;
>>>    	unsigned long flags;
>>> +	int from, cpu;
>>>      	/*
>>>    	 * Changing affinity is mega expensive, so let's be as lazy as
>>> @@ -3843,19 +3844,22 @@ static int its_vpe_set_affinity(struct irq_data *d,
>>>    	 * taken on any vLPI handling path that evaluates vpe->col_idx.
>>>    	 */
>>>    	from = vpe_to_cpuid_lock(vpe, &flags);
>>> -	if (from == cpu)
>>> -		goto out;
>>> -
>>> -	vpe->col_idx = cpu;
>>>      	/*
>>> -	 * GICv4.1 allows us to skip VMOVP if moving to a cpu whose RD
>>> -	 * is sharing its VPE table with the current one.
>>> +	 * If we are offered another CPU in the same GICv4.1 ITS
>>> +	 * affinity, pick this one. Otherwise, any CPU will do.
>>>    	 */
>>> -	if (gic_data_rdist_cpu(cpu)->vpe_table_mask &&
>>> -	    cpumask_test_cpu(from, gic_data_rdist_cpu(cpu)->vpe_table_mask))
>>> +	if (gic_data_rdist_cpu(from)->vpe_table_mask &&
>>> +	    cpumask_and(&common, mask_val, gic_data_rdist_cpu(from)->vpe_table_mask))
>>> +		cpu = cpumask_first(&common);
>>> +	else
>>> +		cpu = cpumask_first(mask_val);
>>> +
>>> +	if (from == cpu)
>>>    		goto out;
>> The meaning here should be that VMOVP will be skipped
>> only if from and target are the same. Then why do we
>> still to judge whether they are in the same GICv4.1
>> ITS affinity?
> Having the same ITS affinity is a performance optimisation, as this
> minimises traffic between ITSs and keep cache locality. If the
> proposed affinity contains CPUs that are in the same affinity as
> current one, then it makes sense to stay there if possible in order to
> keep the benefit of the warm translation caches.
Ok, I see. In another case, is it possible to skip VMOVP?
The from belongs to common, but it is not the first one.

Kunkun Jiang