[bug report] KVM: arm64: vgic-v4: Occasionally issue VMOVP to an unmapped VPE on GICv4.1
Kunkun Jiang
jiangkunkun at huawei.com
Sun Sep 29 23:25:28 PDT 2024
Hi Marc,
On 2024/9/29 18:07, Marc Zyngier wrote:
> On Sun, 29 Sep 2024 08:18:41 +0100,
> Kunkun Jiang <jiangkunkun at huawei.com> wrote:
>>
>> Hi all,
>>
>> I found a problem with occasionally issuing VMOVP to an unmapped VPE
>> on GICv4.1. In my test environment, operating an unmapped VPE will
>> generate RAS, so I found this problem. The detailed analysis is as
>> follows.
>>
>> The vgic_v4_teardown() will be executed when VM is destroyed to free
>> the GICv4 data structures. The code is as follows:
>>> /**
>>> * vgic_v4_teardown - Free the GICv4 data structures
>>> * @kvm: Pointer to the VM being destroyed
>>> */
>>> void vgic_v4_teardown(struct kvm *kvm)
>>> {
>>> struct its_vm *its_vm = &kvm->arch.vgic.its_vm;
>>> int i;
>>>
>>> lockdep_assert_held(&kvm->arch.config_lock);
>>>
>>> if (!its_vm->vpes)
>>> return;
>>>
>>> for (i = 0; i < its_vm->nr_vpes; i++) {
>>> struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, i);
>>> int irq = its_vm->vpes[i]->irq;
>>>
>>> irq_clear_status_flags(irq, DB_IRQ_FLAGS);
>>> free_irq(irq, vcpu);
>>> }
>>>
>>> its_free_vcpu_irqs(its_vm);
>>> kfree(its_vm->vpes);
>>> its_vm->nr_vpes = 0;
>>> its_vm->vpes = NULL;
>>> }
>>
>> [1] In irq_clear_status_flags(irq, DB_IRQ_FLAGS), the status flags of
>> a doorbell are cleared. DB_IRQ_FLAGS contains IRQ_NO_BALANCING. So
>> after this,the irqbalance.service can schedule the doorbell.
>> [2] In free_irq(), the VPE is unmaped.
>> [3] In its_free_vcpu_irqs(its_vm), unregister_irq_proc() is called to
>> delete the contents in /proc/irq/xx/ of the doorbell.
>>
>> For VPEs in large-scale VM, there is a centain time window between [2]
>> and [3]. The irqbalance.service got a chance to schedule the
>> doorbell. Therefore, the VMOVP is issued to an unmapped VPE.
>>
>> I tried not clearing IRQ_NO_BALANCING and the problem was solved. But
>> it's not clear if there's any other problem with doing so.
>
> I don't think that's a good idea, because whoever request the same
> interrupt number again for a different purpose will have the flag set
> and will experience odd behaviours.
>
> I'd rather fix it for good, given that we have all the necessary
> tracking in place already. Something like the patch below, as usual
> untested.
After testing, the patch below fixes my problem.
Thanks,
Kunkun Jiang
>
> Thanks,
>
> M.
>
>>From c0fe216c50651458fdf1ebb6b3a1e4ffe2bcc6c2 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <maz at kernel.org>
> Date: Sun, 29 Sep 2024 10:58:19 +0100
> Subject: [PATCH] irqchip/gic-v4: Don't allow a VMOVP on a dying VPE
>
> Kunkun Jiang reports that there is a small window of opportunity for
> userspace to force a change of affinity for a VPE while the VPE has
> already been unmapped, but the corresponding doorbell interrupt still
> visible in /proc/irq/.
>
> Plug the race by checking the value of vmapp_count, which tracks whether
> the VPE is mapped ot not, and returning an error in this case.
>
> Reported-by: Kunkun Jiang <jiangkunkun at huawei.com>
> Signed-off-by: Marc Zyngier <maz at kernel.org>
> Link: https://lore.kernel.org/r/c182ece6-2ba0-ce4f-3404-dba7a3ab6c52@huawei.com
> ---
> drivers/irqchip/irq-gic-v3-its.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index fdec478ba5e7..ba9734d1a41f 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -3806,6 +3806,13 @@ static int its_vpe_set_affinity(struct irq_data *d,
> struct cpumask *table_mask;
> unsigned long flags;
>
> + /*
> + * Check if we're racing against a VPE being destroyed, for
> + * which we don't want to allow a VMOVP.
> + */
> + if (!atomic_read(&vpe->vmapp_count))
> + return -EINVAL;
> +
> /*
> * Changing affinity is mega expensive, so let's be as lazy as
> * we can and only do it if we really have to. Also, if mapped
>
More information about the linux-arm-kernel
mailing list