[PATCH v2 1/2] irqchip/gic-v4.1: Fix GICv4.1 doorbell affinity

Wed Jan 31 06:05:32 PST 2024

On Wed, 31 Jan 2024 01:29:40 +0000,
Kunkun Jiang <jiangkunkun at huawei.com> wrote:
> 
> Hi Marc,
> 
> On 2024/1/30 21:55, Marc Zyngier wrote:
> > On Tue, 30 Jan 2024 13:32:24 +0000,
> > Kunkun Jiang <jiangkunkun at huawei.com> wrote:
> >> Hi Marc,
> >> 
> >> On 2024/1/28 21:28, Marc Zyngier wrote:
> >>> On Sat, 27 Jan 2024 02:41:55 +0000,
> >>> Kunkun Jiang <jiangkunkun at huawei.com> wrote:
> >>>>> At this stage, I think the VMOVP optimisation is wrong and that we
> >>>>> should drop it.
> >>>> This is the last resort. Maybe we can think of another way.
> >>> I think the only other way is to never elide the VMOVP from
> >>> set_affinity, but to only call into set_affinity when strictly
> >>> necessary:
> >>> 
> >>> - when making the VPE resident on a CPU that isn't part of the same
> >>>     affinity group (VPE migration outside of the affinity group)
> >>> 
> >>> - when making the VPE non-resident on a CPU that isn't the doorbell
> >>>     delivery CPU *AND* that there is a need for a doorbell (VPE
> >>>     migration inside the affinity group)
> >>> 
> >>> Which means it is the responsibility of the upper layers of the stack
> >>> to make these decisions, and that we can keep the low-level driver as
> >>> simple as possible.
> >>> 
> >>> In the meantime, here's the patch I'm proposing to merge ASAP. Please
> >>> let me know if that works for you.
> >> This patch works. But I still have some questions, which I will
> >> write down below.
> >>> 	M.
> >>> 
> >>> >From 00c338e91d3b6600f402f56cdb64c9d370f911c8 Mon Sep 17 00:00:00 2001
> >>> From: Marc Zyngier <maz at kernel.org>
> >>> Date: Sun, 28 Jan 2024 13:05:24 +0000
> >>> Subject: [PATCH] irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
> >>> 
> >>> When updating the affinity of a VPE, we currently skip the VMOVP
> >>> command if the two CPUs are part of the same VPE affinity.
> >>> 
> >>> But this is wrong, as the doorbell corresponding to this VPE
> >>> is still delivered on the 'old' CPU, which screws up the balancing.
> >>> Furthermore, offlining that 'old' CPU results in doorbell interrupts
> >>> generated for this VPE being discarded.
> >>> 
> >>> The harsh reality is that we cannot easily elide VMOVP when
> >>> a set_affinity request occurs. It needs to be obeyed, and if
> >>> an optimisation is to be made, it is at the point where the affinity
> >>> change request is made (such as in KVM).
> >>> 
> >>> Drop the VMOVP elision altogether, and only use the vpe_table_mask
> >>> to try and stay within the same ITS affinity group if at all possible.
> >>> 
> >>> Fixes: dd3f050a216e (irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP)
> >>> Reported-by: Kunkun Jiang <jiangkunkun at huawei.com>
> >>> Signed-off-by: Marc Zyngier <maz at kernel.org>
> >>> Cc: stable at vger.kernel.org
> >>> ---
> >>>    drivers/irqchip/irq-gic-v3-its.c | 22 +++++++++++++---------
> >>>    1 file changed, 13 insertions(+), 9 deletions(-)
> >>> 
> >>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> >>> index 250b4562f308..78aece62bff5 100644
> >>> --- a/drivers/irqchip/irq-gic-v3-its.c
> >>> +++ b/drivers/irqchip/irq-gic-v3-its.c
> >>> @@ -3826,8 +3826,9 @@ static int its_vpe_set_affinity(struct irq_data *d,
> >>>    				bool force)
> >>>    {
> >>>    	struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
> >>> -	int from, cpu = cpumask_first(mask_val);
> >>> +	struct cpumask common;
> >>>    	unsigned long flags;
> >>> +	int from, cpu;
> >>>      	/*
> >>>    	 * Changing affinity is mega expensive, so let's be as lazy as
> >>> @@ -3843,19 +3844,22 @@ static int its_vpe_set_affinity(struct irq_data *d,
> >>>    	 * taken on any vLPI handling path that evaluates vpe->col_idx.
> >>>    	 */
> >>>    	from = vpe_to_cpuid_lock(vpe, &flags);
> >>> -	if (from == cpu)
> >>> -		goto out;
> >>> -
> >>> -	vpe->col_idx = cpu;
> >>>      	/*
> >>> -	 * GICv4.1 allows us to skip VMOVP if moving to a cpu whose RD
> >>> -	 * is sharing its VPE table with the current one.
> >>> +	 * If we are offered another CPU in the same GICv4.1 ITS
> >>> +	 * affinity, pick this one. Otherwise, any CPU will do.
> >>>    	 */
> >>> -	if (gic_data_rdist_cpu(cpu)->vpe_table_mask &&
> >>> -	    cpumask_test_cpu(from, gic_data_rdist_cpu(cpu)->vpe_table_mask))
> >>> +	if (gic_data_rdist_cpu(from)->vpe_table_mask &&
> >>> +	    cpumask_and(&common, mask_val, gic_data_rdist_cpu(from)->vpe_table_mask))
> >>> +		cpu = cpumask_first(&common);
> >>> +	else
> >>> +		cpu = cpumask_first(mask_val);
> >>> +
> >>> +	if (from == cpu)
> >>>    		goto out;
> >> The meaning here should be that VMOVP will be skipped
> >> only if from and target are the same. Then why do we
> >> still to judge whether they are in the same GICv4.1
> >> ITS affinity?
> > Having the same ITS affinity is a performance optimisation, as this
> > minimises traffic between ITSs and keep cache locality. If the
> > proposed affinity contains CPUs that are in the same affinity as
> > current one, then it makes sense to stay there if possible in order to
> > keep the benefit of the warm translation caches.
> Ok, I see. In another case, is it possible to skip VMOVP?
> The from belongs to common, but it is not the first one.

You could have something like

		if (cpumask_test_cpu(from, &common))
			cpu = from;
		else
			cpu = cpumask_first(&common);

	M.

-- 
Without deviation from the norm, progress is not possible.