[bug report] KVM: arm64: vgic-v4: Occasionally issue VMOVP to an unmapped VPE on GICv4.1

Kunkun Jiang jiangkunkun at huawei.com
Sun Sep 29 23:25:28 PDT 2024


Hi Marc,

On 2024/9/29 18:07, Marc Zyngier wrote:
> On Sun, 29 Sep 2024 08:18:41 +0100,
> Kunkun Jiang <jiangkunkun at huawei.com> wrote:
>>
>> Hi all,
>>
>> I found a problem with occasionally issuing VMOVP to an unmapped VPE
>> on GICv4.1. In my test environment, operating an unmapped VPE will
>> generate RAS, so I found this problem. The detailed analysis is as
>> follows.
>>
>> The vgic_v4_teardown() will be executed when VM is destroyed to free
>> the GICv4 data structures. The code is as follows:
>>> /**
>>>   * vgic_v4_teardown - Free the GICv4 data structures
>>>   * @kvm:        Pointer to the VM being destroyed
>>>   */
>>> void vgic_v4_teardown(struct kvm *kvm)
>>> {
>>>          struct its_vm *its_vm = &kvm->arch.vgic.its_vm;
>>>          int i;
>>>
>>>          lockdep_assert_held(&kvm->arch.config_lock);
>>>
>>>          if (!its_vm->vpes)
>>>                  return;
>>>
>>>          for (i = 0; i < its_vm->nr_vpes; i++) {
>>>                  struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, i);
>>>                  int irq = its_vm->vpes[i]->irq;
>>>
>>>                  irq_clear_status_flags(irq, DB_IRQ_FLAGS);
>>>                  free_irq(irq, vcpu);
>>>          }
>>>
>>>          its_free_vcpu_irqs(its_vm);
>>>          kfree(its_vm->vpes);
>>>          its_vm->nr_vpes = 0;
>>>          its_vm->vpes = NULL;
>>> }
>>
>> [1] In irq_clear_status_flags(irq, DB_IRQ_FLAGS), the status flags of
>> a doorbell are cleared. DB_IRQ_FLAGS contains IRQ_NO_BALANCING. So
>> after this,the irqbalance.service can schedule the doorbell.
>> [2] In free_irq(), the VPE is unmaped.
>> [3] In its_free_vcpu_irqs(its_vm), unregister_irq_proc() is called to
>> delete the contents in /proc/irq/xx/ of the doorbell.
>>
>> For VPEs in large-scale VM, there is a centain time window between [2]
>> and [3]. The irqbalance.service got a chance to schedule the
>> doorbell. Therefore, the VMOVP is issued to an unmapped VPE.
>>
>> I tried not clearing IRQ_NO_BALANCING and the problem was solved. But
>> it's not clear if there's any other problem with doing so.
> 
> I don't think that's a good idea, because whoever request the same
> interrupt number again for a different purpose will have the flag set
> and will experience odd behaviours.
> 
> I'd rather fix it for good, given that we have all the necessary
> tracking in place already. Something like the patch below, as usual
> untested.

After testing, the patch below fixes my problem.

Thanks,
Kunkun Jiang

> 
> Thanks,
> 
> 	M.
> 
>>From c0fe216c50651458fdf1ebb6b3a1e4ffe2bcc6c2 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <maz at kernel.org>
> Date: Sun, 29 Sep 2024 10:58:19 +0100
> Subject: [PATCH] irqchip/gic-v4: Don't allow a VMOVP on a dying VPE
> 
> Kunkun Jiang reports that there is a small window of opportunity for
> userspace to force a change of affinity for a VPE while the VPE has
> already been unmapped, but the corresponding doorbell interrupt still
> visible in /proc/irq/.
> 
> Plug the race by checking the value of vmapp_count, which tracks whether
> the VPE is mapped ot not, and returning an error in this case.
> 
> Reported-by: Kunkun Jiang <jiangkunkun at huawei.com>
> Signed-off-by: Marc Zyngier <maz at kernel.org>
> Link: https://lore.kernel.org/r/c182ece6-2ba0-ce4f-3404-dba7a3ab6c52@huawei.com
> ---
>   drivers/irqchip/irq-gic-v3-its.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index fdec478ba5e7..ba9734d1a41f 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -3806,6 +3806,13 @@ static int its_vpe_set_affinity(struct irq_data *d,
>   	struct cpumask *table_mask;
>   	unsigned long flags;
>   
> +	/*
> +	 * Check if we're racing against a VPE being destroyed, for
> +	 * which we don't want to allow a VMOVP.
> +	 */
> +	if (!atomic_read(&vpe->vmapp_count))
> +		return -EINVAL;
> +
>   	/*
>   	 * Changing affinity is mega expensive, so let's be as lazy as
>   	 * we can and only do it if we really have to. Also, if mapped
> 



More information about the linux-arm-kernel mailing list