Question on guest enable msi fail when using GICv4/4.1

Marc Zyngier maz at kernel.org
Mon May 10 00:49:13 PDT 2021


Hi Eric,

On Sun, 09 May 2021 18:00:04 +0100,
Auger Eric <eric.auger at redhat.com> wrote:
> 
> Hi,
> On 5/7/21 1:02 PM, Marc Zyngier wrote:
> > On Fri, 07 May 2021 10:58:23 +0100,
> > Shaokun Zhang <zhangshaokun at hisilicon.com> wrote:
> >>
> >> Hi Marc,
> >>
> >> Thanks for your quick reply.
> >>
> >> On 2021/5/7 17:03, Marc Zyngier wrote:
> >>> On Fri, 07 May 2021 06:57:04 +0100,
> >>> Shaokun Zhang <zhangshaokun at hisilicon.com> wrote:
> >>>>
> >>>> [This letter comes from Nianyao Tang]
> >>>>
> >>>> Hi,
> >>>>
> >>>> Using GICv4/4.1 and msi capability, guest vf driver requires 3
> >>>> vectors and enable msi, will lead to guest stuck.
> >>>
> >>> Stuck how?
> >>
> >> Guest serial does not response anymore and guest network shutdown.
> >>
> >>>
> >>>> Qemu gets number of interrupts from Multiple Message Capable field
> >>>> set by guest. This field is aligned to a power of 2(if a function
> >>>> requires 3 vectors, it initializes it to 2).
> >>>
> >>> So I guess this is a MultiMSI device with 4 vectors, right?
> >>>
> >>
> >> Yes, it can support maximum of 32 msi interrupts, and vf driver only use 3 msi.
> >>
> >>>> However, guest driver just sends 3 mapi-cmd to vits and 3 ite
> >>>> entries is recorded in host.  Vfio initializes msi interrupts using
> >>>> the number of interrupts 4 provide by qemu.  When it comes to the
> >>>> 4th msi without ite in vits, in irq_bypass_register_producer,
> >>>> producer and consumer will __connect fail, due to find_ite fail, and
> >>>> do not resume guest.
> >>>
> >>> Let me rephrase this to check that I understand it:
> >>> - The device has 4 vectors
> >>> - The guest only create mappings for 3 of them
> >>> - VFIO calls kvm_vgic_v4_set_forwarding() for each vector
> >>> - KVM doesn't have a mapping for the 4th vector and returns an error
> >>> - VFIO disable this 4th vector
> >>>
> >>> Is that correct? If yes, I don't understand why that impacts the guest
> >>> at all. From what I can see, vfio_msi_set_vector_signal() just prints
> >>> a message on the console and carries on.
> >>>
> >>
> >> function calls:
> >> --> vfio_msi_set_vector_signal
> >>    --> irq_bypass_register_producer
> >>       -->__connect
> >>
> >> in __connect, add_producer finally calls kvm_vgic_v4_set_forwarding
> >> and fails to get the 4th mapping. When add_producer fail, it does
> >> not call cons->start, calls kvm_arch_irq_bypass_start and then
> >> kvm_arm_resume_guest.
> > 
> > [+Eric, who wrote the irq_bypass infrastructure.]
> > 
> > Ah, so the guest is actually paused, not in a livelock situation
> > (which is how I interpreted "stuck").
> > 
> > I think we should handle this case gracefully, as there should be no
> > expectation that the guest will be using this interrupt. Given that
> > VFIO seems to be pretty unfazed when a producer fails, I'm temped to
> > do the same thing and restart the guest.
> > 
> > Also, __disconnect doesn't care about errors, so why should __connect
> > have this odd behaviour?
> 
> _disconnect() does not care as we should always succeed tearing off
> things. del_* ops are void functions. On the opposite we can fail
> setting up the bypass.
>
> Effectively
> a979a6aa009f ("irqbypass: do not start cons/prod when failed connect")
> needs to be reverted.
> 
> I agree the kerneldoc comments in linux/irqbypass.h may be improved to
> better explain the role of stop/start cbs and warn about their potential
> global impact.

Yup. It also begs the question of why we have producer callbacks, as
nobody seems to use them.

> wrt the case above, "in __connect, add_producer finally calls
> kvm_vgic_v4_set_forwarding and fails to get the 4th mapping", shouldn't
> we succeed in that case?

From a KVM perspective, we can't return a success because there is no
guest LPI that matches the input signal.

And such failure seems to be expected by the VFIO code, which just
prints a message on the console and set the producer token to NULL. So
returning an error from the KVM code is useful, at least to an extent.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list