[bug report] KVM: arm64: vgic-v4: Occasionally issue VMOVP to an unmapped VPE on GICv4.1

Marc Zyngier maz at kernel.org
Mon Sep 30 07:09:48 PDT 2024


On Mon, 30 Sep 2024 07:25:28 +0100,
Kunkun Jiang <jiangkunkun at huawei.com> wrote:
> 
> Hi Marc,
> 
> On 2024/9/29 18:07, Marc Zyngier wrote:
> > On Sun, 29 Sep 2024 08:18:41 +0100,
> > Kunkun Jiang <jiangkunkun at huawei.com> wrote:
> >> 
> >> Hi all,
> >> 
> >> I found a problem with occasionally issuing VMOVP to an unmapped VPE
> >> on GICv4.1. In my test environment, operating an unmapped VPE will
> >> generate RAS, so I found this problem. The detailed analysis is as
> >> follows.
> >> 
> >> The vgic_v4_teardown() will be executed when VM is destroyed to free
> >> the GICv4 data structures. The code is as follows:
> >>> /**
> >>>   * vgic_v4_teardown - Free the GICv4 data structures
> >>>   * @kvm:        Pointer to the VM being destroyed
> >>>   */
> >>> void vgic_v4_teardown(struct kvm *kvm)
> >>> {
> >>>          struct its_vm *its_vm = &kvm->arch.vgic.its_vm;
> >>>          int i;
> >>> 
> >>>          lockdep_assert_held(&kvm->arch.config_lock);
> >>> 
> >>>          if (!its_vm->vpes)
> >>>                  return;
> >>> 
> >>>          for (i = 0; i < its_vm->nr_vpes; i++) {
> >>>                  struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, i);
> >>>                  int irq = its_vm->vpes[i]->irq;
> >>> 
> >>>                  irq_clear_status_flags(irq, DB_IRQ_FLAGS);
> >>>                  free_irq(irq, vcpu);
> >>>          }
> >>> 
> >>>          its_free_vcpu_irqs(its_vm);
> >>>          kfree(its_vm->vpes);
> >>>          its_vm->nr_vpes = 0;
> >>>          its_vm->vpes = NULL;
> >>> }
> >> 
> >> [1] In irq_clear_status_flags(irq, DB_IRQ_FLAGS), the status flags of
> >> a doorbell are cleared. DB_IRQ_FLAGS contains IRQ_NO_BALANCING. So
> >> after this,the irqbalance.service can schedule the doorbell.
> >> [2] In free_irq(), the VPE is unmaped.
> >> [3] In its_free_vcpu_irqs(its_vm), unregister_irq_proc() is called to
> >> delete the contents in /proc/irq/xx/ of the doorbell.
> >> 
> >> For VPEs in large-scale VM, there is a centain time window between [2]
> >> and [3]. The irqbalance.service got a chance to schedule the
> >> doorbell. Therefore, the VMOVP is issued to an unmapped VPE.
> >> 
> >> I tried not clearing IRQ_NO_BALANCING and the problem was solved. But
> >> it's not clear if there's any other problem with doing so.
> > 
> > I don't think that's a good idea, because whoever request the same
> > interrupt number again for a different purpose will have the flag set
> > and will experience odd behaviours.
> > 
> > I'd rather fix it for good, given that we have all the necessary
> > tracking in place already. Something like the patch below, as usual
> > untested.
> 
> After testing, the patch below fixes my problem.

Thanks. But it also introduces a regression on GICv4.0, which doesn't
use vmapp_count. So the fix is slightly more involved.

I'll try to post a more complete patch later this week.

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list