[bug report] KVM: arm64: vgic-v4: Occasionally issue VMOVP to an unmapped VPE on GICv4.1
Marc Zyngier
maz at kernel.org
Mon Sep 30 07:09:48 PDT 2024
On Mon, 30 Sep 2024 07:25:28 +0100,
Kunkun Jiang <jiangkunkun at huawei.com> wrote:
>
> Hi Marc,
>
> On 2024/9/29 18:07, Marc Zyngier wrote:
> > On Sun, 29 Sep 2024 08:18:41 +0100,
> > Kunkun Jiang <jiangkunkun at huawei.com> wrote:
> >>
> >> Hi all,
> >>
> >> I found a problem with occasionally issuing VMOVP to an unmapped VPE
> >> on GICv4.1. In my test environment, operating an unmapped VPE will
> >> generate RAS, so I found this problem. The detailed analysis is as
> >> follows.
> >>
> >> The vgic_v4_teardown() will be executed when VM is destroyed to free
> >> the GICv4 data structures. The code is as follows:
> >>> /**
> >>> * vgic_v4_teardown - Free the GICv4 data structures
> >>> * @kvm: Pointer to the VM being destroyed
> >>> */
> >>> void vgic_v4_teardown(struct kvm *kvm)
> >>> {
> >>> struct its_vm *its_vm = &kvm->arch.vgic.its_vm;
> >>> int i;
> >>>
> >>> lockdep_assert_held(&kvm->arch.config_lock);
> >>>
> >>> if (!its_vm->vpes)
> >>> return;
> >>>
> >>> for (i = 0; i < its_vm->nr_vpes; i++) {
> >>> struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, i);
> >>> int irq = its_vm->vpes[i]->irq;
> >>>
> >>> irq_clear_status_flags(irq, DB_IRQ_FLAGS);
> >>> free_irq(irq, vcpu);
> >>> }
> >>>
> >>> its_free_vcpu_irqs(its_vm);
> >>> kfree(its_vm->vpes);
> >>> its_vm->nr_vpes = 0;
> >>> its_vm->vpes = NULL;
> >>> }
> >>
> >> [1] In irq_clear_status_flags(irq, DB_IRQ_FLAGS), the status flags of
> >> a doorbell are cleared. DB_IRQ_FLAGS contains IRQ_NO_BALANCING. So
> >> after this,the irqbalance.service can schedule the doorbell.
> >> [2] In free_irq(), the VPE is unmaped.
> >> [3] In its_free_vcpu_irqs(its_vm), unregister_irq_proc() is called to
> >> delete the contents in /proc/irq/xx/ of the doorbell.
> >>
> >> For VPEs in large-scale VM, there is a centain time window between [2]
> >> and [3]. The irqbalance.service got a chance to schedule the
> >> doorbell. Therefore, the VMOVP is issued to an unmapped VPE.
> >>
> >> I tried not clearing IRQ_NO_BALANCING and the problem was solved. But
> >> it's not clear if there's any other problem with doing so.
> >
> > I don't think that's a good idea, because whoever request the same
> > interrupt number again for a different purpose will have the flag set
> > and will experience odd behaviours.
> >
> > I'd rather fix it for good, given that we have all the necessary
> > tracking in place already. Something like the patch below, as usual
> > untested.
>
> After testing, the patch below fixes my problem.
Thanks. But it also introduces a regression on GICv4.0, which doesn't
use vmapp_count. So the fix is slightly more involved.
I'll try to post a more complete patch later this week.
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list