[PATCH v2 1/2] irqchip/gic-v4.1: Fix GICv4.1 doorbell affinity
Marc Zyngier
maz at kernel.org
Wed Jan 31 06:05:32 PST 2024
On Wed, 31 Jan 2024 01:29:40 +0000,
Kunkun Jiang <jiangkunkun at huawei.com> wrote:
>
> Hi Marc,
>
> On 2024/1/30 21:55, Marc Zyngier wrote:
> > On Tue, 30 Jan 2024 13:32:24 +0000,
> > Kunkun Jiang <jiangkunkun at huawei.com> wrote:
> >> Hi Marc,
> >>
> >> On 2024/1/28 21:28, Marc Zyngier wrote:
> >>> On Sat, 27 Jan 2024 02:41:55 +0000,
> >>> Kunkun Jiang <jiangkunkun at huawei.com> wrote:
> >>>>> At this stage, I think the VMOVP optimisation is wrong and that we
> >>>>> should drop it.
> >>>> This is the last resort. Maybe we can think of another way.
> >>> I think the only other way is to never elide the VMOVP from
> >>> set_affinity, but to only call into set_affinity when strictly
> >>> necessary:
> >>>
> >>> - when making the VPE resident on a CPU that isn't part of the same
> >>> affinity group (VPE migration outside of the affinity group)
> >>>
> >>> - when making the VPE non-resident on a CPU that isn't the doorbell
> >>> delivery CPU *AND* that there is a need for a doorbell (VPE
> >>> migration inside the affinity group)
> >>>
> >>> Which means it is the responsibility of the upper layers of the stack
> >>> to make these decisions, and that we can keep the low-level driver as
> >>> simple as possible.
> >>>
> >>> In the meantime, here's the patch I'm proposing to merge ASAP. Please
> >>> let me know if that works for you.
> >> This patch works. But I still have some questions, which I will
> >> write down below.
> >>> M.
> >>>
> >>> >From 00c338e91d3b6600f402f56cdb64c9d370f911c8 Mon Sep 17 00:00:00 2001
> >>> From: Marc Zyngier <maz at kernel.org>
> >>> Date: Sun, 28 Jan 2024 13:05:24 +0000
> >>> Subject: [PATCH] irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
> >>>
> >>> When updating the affinity of a VPE, we currently skip the VMOVP
> >>> command if the two CPUs are part of the same VPE affinity.
> >>>
> >>> But this is wrong, as the doorbell corresponding to this VPE
> >>> is still delivered on the 'old' CPU, which screws up the balancing.
> >>> Furthermore, offlining that 'old' CPU results in doorbell interrupts
> >>> generated for this VPE being discarded.
> >>>
> >>> The harsh reality is that we cannot easily elide VMOVP when
> >>> a set_affinity request occurs. It needs to be obeyed, and if
> >>> an optimisation is to be made, it is at the point where the affinity
> >>> change request is made (such as in KVM).
> >>>
> >>> Drop the VMOVP elision altogether, and only use the vpe_table_mask
> >>> to try and stay within the same ITS affinity group if at all possible.
> >>>
> >>> Fixes: dd3f050a216e (irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP)
> >>> Reported-by: Kunkun Jiang <jiangkunkun at huawei.com>
> >>> Signed-off-by: Marc Zyngier <maz at kernel.org>
> >>> Cc: stable at vger.kernel.org
> >>> ---
> >>> drivers/irqchip/irq-gic-v3-its.c | 22 +++++++++++++---------
> >>> 1 file changed, 13 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> >>> index 250b4562f308..78aece62bff5 100644
> >>> --- a/drivers/irqchip/irq-gic-v3-its.c
> >>> +++ b/drivers/irqchip/irq-gic-v3-its.c
> >>> @@ -3826,8 +3826,9 @@ static int its_vpe_set_affinity(struct irq_data *d,
> >>> bool force)
> >>> {
> >>> struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
> >>> - int from, cpu = cpumask_first(mask_val);
> >>> + struct cpumask common;
> >>> unsigned long flags;
> >>> + int from, cpu;
> >>> /*
> >>> * Changing affinity is mega expensive, so let's be as lazy as
> >>> @@ -3843,19 +3844,22 @@ static int its_vpe_set_affinity(struct irq_data *d,
> >>> * taken on any vLPI handling path that evaluates vpe->col_idx.
> >>> */
> >>> from = vpe_to_cpuid_lock(vpe, &flags);
> >>> - if (from == cpu)
> >>> - goto out;
> >>> -
> >>> - vpe->col_idx = cpu;
> >>> /*
> >>> - * GICv4.1 allows us to skip VMOVP if moving to a cpu whose RD
> >>> - * is sharing its VPE table with the current one.
> >>> + * If we are offered another CPU in the same GICv4.1 ITS
> >>> + * affinity, pick this one. Otherwise, any CPU will do.
> >>> */
> >>> - if (gic_data_rdist_cpu(cpu)->vpe_table_mask &&
> >>> - cpumask_test_cpu(from, gic_data_rdist_cpu(cpu)->vpe_table_mask))
> >>> + if (gic_data_rdist_cpu(from)->vpe_table_mask &&
> >>> + cpumask_and(&common, mask_val, gic_data_rdist_cpu(from)->vpe_table_mask))
> >>> + cpu = cpumask_first(&common);
> >>> + else
> >>> + cpu = cpumask_first(mask_val);
> >>> +
> >>> + if (from == cpu)
> >>> goto out;
> >> The meaning here should be that VMOVP will be skipped
> >> only if from and target are the same. Then why do we
> >> still to judge whether they are in the same GICv4.1
> >> ITS affinity?
> > Having the same ITS affinity is a performance optimisation, as this
> > minimises traffic between ITSs and keep cache locality. If the
> > proposed affinity contains CPUs that are in the same affinity as
> > current one, then it makes sense to stay there if possible in order to
> > keep the benefit of the warm translation caches.
> Ok, I see. In another case, is it possible to skip VMOVP?
> The from belongs to common, but it is not the first one.
You could have something like
if (cpumask_test_cpu(from, &common))
cpu = from;
else
cpu = cpumask_first(&common);
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list