[bug report] KVM: arm64: vgic-v4: Occasionally issue VMOVP to an unmapped VPE on GICv4.1

Ganapatrao Kulkarni gankulkarni at os.amperecomputing.com
Sun Sep 29 22:18:29 PDT 2024


Hi Kunkun,

On 29-09-2024 12:48 pm, Kunkun Jiang wrote:
> Hi all,
> 
> I found a problem with occasionally issuing VMOVP to an unmapped VPE on 
> GICv4.1. In my test environment, operating an unmapped VPE will generate 
> RAS, so I found this problem. The detailed analysis is as follows.
> 

May I know, what specific RAS errors you are getting?

> The vgic_v4_teardown() will be executed when VM is destroyed to free the 
> GICv4 data structures. The code is as follows:
>> /**
>>  * vgic_v4_teardown - Free the GICv4 data structures
>>  * @kvm:        Pointer to the VM being destroyed
>>  */
>> void vgic_v4_teardown(struct kvm *kvm)
>> {
>>         struct its_vm *its_vm = &kvm->arch.vgic.its_vm;
>>         int i;
>>
>>         lockdep_assert_held(&kvm->arch.config_lock);
>>
>>         if (!its_vm->vpes)
>>                 return;
>>
>>         for (i = 0; i < its_vm->nr_vpes; i++) {
>>                 struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, i);
>>                 int irq = its_vm->vpes[i]->irq;
>>
>>                 irq_clear_status_flags(irq, DB_IRQ_FLAGS);
>>                 free_irq(irq, vcpu);
>>         }
>>
>>         its_free_vcpu_irqs(its_vm);
>>         kfree(its_vm->vpes);
>>         its_vm->nr_vpes = 0;
>>         its_vm->vpes = NULL;
>> }
> 
> [1] In irq_clear_status_flags(irq, DB_IRQ_FLAGS), the status flags of a 
> doorbell are cleared. DB_IRQ_FLAGS contains IRQ_NO_BALANCING. So after 
> this,the irqbalance.service can schedule the doorbell.
> [2] In free_irq(), the VPE is unmaped.
> [3] In its_free_vcpu_irqs(its_vm), unregister_irq_proc() is called to 
> delete the contents in /proc/irq/xx/ of the doorbell.
> 
> For VPEs in large-scale VM, there is a centain time window between [2] 
> and [3]. The irqbalance.service got a chance to schedule the doorbell. 
> Therefore, the VMOVP is issued to an unmapped VPE.
> 
> I tried not clearing IRQ_NO_BALANCING and the problem was solved. But 
> it's not clear if there's any other problem with doing so.
> 
> Thanks,
> Kunkun Jiang
> 
> 
> 
> 
> 

-- 
Thanks,
Ganapat/GK



More information about the linux-arm-kernel mailing list