Question on guest enable msi fail when using GICv4/4.1

Marc Zyngier maz at kernel.org
Fri May 7 04:02:57 PDT 2021


On Fri, 07 May 2021 10:58:23 +0100,
Shaokun Zhang <zhangshaokun at hisilicon.com> wrote:
> 
> Hi Marc,
> 
> Thanks for your quick reply.
> 
> On 2021/5/7 17:03, Marc Zyngier wrote:
> > On Fri, 07 May 2021 06:57:04 +0100,
> > Shaokun Zhang <zhangshaokun at hisilicon.com> wrote:
> >>
> >> [This letter comes from Nianyao Tang]
> >>
> >> Hi,
> >>
> >> Using GICv4/4.1 and msi capability, guest vf driver requires 3
> >> vectors and enable msi, will lead to guest stuck.
> > 
> > Stuck how?
> 
> Guest serial does not response anymore and guest network shutdown.
> 
> > 
> >> Qemu gets number of interrupts from Multiple Message Capable field
> >> set by guest. This field is aligned to a power of 2(if a function
> >> requires 3 vectors, it initializes it to 2).
> > 
> > So I guess this is a MultiMSI device with 4 vectors, right?
> > 
> 
> Yes, it can support maximum of 32 msi interrupts, and vf driver only use 3 msi.
> 
> >> However, guest driver just sends 3 mapi-cmd to vits and 3 ite
> >> entries is recorded in host.  Vfio initializes msi interrupts using
> >> the number of interrupts 4 provide by qemu.  When it comes to the
> >> 4th msi without ite in vits, in irq_bypass_register_producer,
> >> producer and consumer will __connect fail, due to find_ite fail, and
> >> do not resume guest.
> > 
> > Let me rephrase this to check that I understand it:
> > - The device has 4 vectors
> > - The guest only create mappings for 3 of them
> > - VFIO calls kvm_vgic_v4_set_forwarding() for each vector
> > - KVM doesn't have a mapping for the 4th vector and returns an error
> > - VFIO disable this 4th vector
> > 
> > Is that correct? If yes, I don't understand why that impacts the guest
> > at all. From what I can see, vfio_msi_set_vector_signal() just prints
> > a message on the console and carries on.
> > 
> 
> function calls:
> --> vfio_msi_set_vector_signal
>    --> irq_bypass_register_producer
>       -->__connect
> 
> in __connect, add_producer finally calls kvm_vgic_v4_set_forwarding
> and fails to get the 4th mapping. When add_producer fail, it does
> not call cons->start, calls kvm_arch_irq_bypass_start and then
> kvm_arm_resume_guest.

[+Eric, who wrote the irq_bypass infrastructure.]

Ah, so the guest is actually paused, not in a livelock situation
(which is how I interpreted "stuck").

I think we should handle this case gracefully, as there should be no
expectation that the guest will be using this interrupt. Given that
VFIO seems to be pretty unfazed when a producer fails, I'm temped to
do the same thing and restart the guest.

Also, __disconnect doesn't care about errors, so why should __connect
have this odd behaviour?

Can you please try this? It is completely untested (and I think the
del_consumer call is odd, which is why I've also dropped it).

Eric, what do you think?

Thanks,

	M.

diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
index c9bb3957f58a..7e1865e15668 100644
--- a/virt/lib/irqbypass.c
+++ b/virt/lib/irqbypass.c
@@ -40,21 +40,14 @@ static int __connect(struct irq_bypass_producer *prod,
 	if (prod->add_consumer)
 		ret = prod->add_consumer(prod, cons);
 
-	if (ret)
-		goto err_add_consumer;
-
-	ret = cons->add_producer(cons, prod);
-	if (ret)
-		goto err_add_producer;
+	if (!ret)
+		ret = cons->add_producer(cons, prod);
 
 	if (cons->start)
 		cons->start(cons);
 	if (prod->start)
 		prod->start(prod);
-err_add_producer:
-	if (prod->del_consumer)
-		prod->del_consumer(prod, cons);
-err_add_consumer:
+
 	return ret;
 }
 

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list