[PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
David Woodhouse
dwmw2 at infradead.org
Fri Jun 20 11:00:12 PDT 2025
On Fri, 2025-06-20 at 10:22 -0700, Sean Christopherson wrote:
> On Fri, Jun 13, 2025, Oliver Upton wrote:
> > On Thu, Jun 12, 2025 at 07:34:35AM -0700, Sean Christopherson wrote:
> > > On Thu, Jun 12, 2025, Marc Zyngier wrote:
> > > > But not having an VLPI mapping for an interrupt at the point where we're
> > > > tearing down the forwarding is pretty benign. IRQs *still* go where they
> > > > should, and we don't lose anything.
> >
> > The VM may not actually be getting torn down, though. The series of
> > fixes [*] we took for 6.16 addressed games that VMMs might be playing on
> > irqbypass for a live VM.
> >
> > [*] https://lore.kernel.org/kvmarm/20250523194722.4066715-1-oliver.upton@linux.dev/
> >
> > > All of those failure scenario seem like warnable offences when KVM thinks it has
> > > configured the IRQ to be forwarded to a vCPU.
> >
> > I tend to agree here, especially considering how horribly fragile GICv4
> > has been in some systems. I know of a couple implementations where ITS
> > command failures and/or unmapped MSIs are fatal for the entire machine.
> > Debugging them has been a genuine pain in the ass.
> >
> > WARN'ing when state tracking for vLPIs is out of whack would've made it
> > a little easier.
>
> Marc, does this look and read better?
>
> I'd really, really like to get this sorted out asap, as it's the only thing
> blocking the series, and I want to get the series into linux-next early next
> week, before I go OOO for ~10 days.
>
> --
> From: Sean Christopherson <seanjc at google.com>
> Date: Thu, 12 Jun 2025 16:51:47 -0700
> Subject: [PATCH] KVM: arm64: WARN if unmapping a vLPI fails in any path
>
> When unmapping a vLPI, WARN if nullifying vCPU affinity fails, not just if
> failure occurs when freeing an ITE. If undoing vCPU affinity fails, then
> odds are very good that vLPI state tracking has has gotten out of whack,
> i.e. that KVM and the GIC disagree on the state of an IRQ/vLPI. At best,
> inconsistent state means there is a lurking bug/flaw somewhere. At worst,
> the inconsistency could eventually be fatal to the host, e.g. if an ITS
> command fails because KVM's view of things doesn't match reality/hardware.
Btw, we finally figured out the reason some machines were just going
dark on kexec, with the new kernel being corrupted. It turns out the
GIC is still scribbling on the vLPI Pending Table even after it isn't
the vLPI Pending Table any more, and is now part of the new kernel's
text.
In my queue I have a patch to call unmap_all_vpes() ∀ kvm on kexec,
which makes the problem go away.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5069 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20250620/fcee3baa/attachment.p7s>
More information about the linux-arm-kernel
mailing list