[RFC PATCH 00/13] Introduce per-vCPU vLPI injection control API
Marc Zyngier
maz at kernel.org
Thu Nov 20 06:40:28 PST 2025
Maximilian: you keep ignoring the reviewers that are listed in
MAINTAINERS. This isn't acceptable. Next time, I will simply ignore
your patches.
On Thu, 20 Nov 2025 14:02:49 +0000,
Maximilian Dittgen <mdittgen at amazon.de> wrote:
>
> At the moment, the ability to direct-inject vLPIs is only enableable
> on an all-or-nothing per-VM basis, causing unnecessary I/O performance
> loss in cases where a VM's vCPU count exceeds available vPEs. This RFC
> introduces per-vCPU control over vLPI injection to realize potential
> I/O performance gain in such situations.
>
> Background
> ----------
>
> The value of dynamically enabling the direct injection of vLPIs on a
> per-vCPU basis is the ability to run guest VMs with simultaneous
> hardware-forwarded and software-forwarded message-signaled interrupts.
>
> Currently, hardware-forwarded vLPI direct injection on a KVM guest
> requires GICv4 and is enabled on a per-VM, all-or-nothing basis. vLPI
> injection enablment happens in two stages:
>
> 1) At vGIC initialization, allocate direct injection structures for
> each vCPU (doorbell IRQ, vPE table entry, virtual pending table,
> vPEID).
> 2) When a PCI device is configured for passthrough, map its MSIs to
> vLPIs using the structures allocated in step 1.
>
> Step 1 is all-or-nothing; if any vCPU cannot be configured with the
> vPE structures necessary for direct injection, the vPEs of all vCPUs
> are torn down and direct injection is disabled VM-wide.
>
> This universality of direct vLPI injection enablement sparks several
> issues, with the most pressing being performance degradation on
> overcommitted hosts.
>
> VM-wide vLPI enablement creates resource inefficiency when guest
> VMs have more vCPUs than the host has available vPEIDs. The amount of
> vPEIDs (and consequently, vPEs) a host can allocate is constrained by
> hardware and defined by GICD_TYPER2.VID + 1 (ITS_MAX_VPEID). Since
> direct injection requires a vCPU to be assigned a vPEID, at most
> ITS_MAX_VPEID vCPUs can be configured for direct injection at a time.
> Because vLPI direct injection is all-or-nothing on a VM, if a new guest
> VM would exhaust remaining vPEIDs, all vCPUs on that VM would fall back
> to hypervisor-forwarded LPIs, causing considerable I/O performance
> degradation.
>
> Such performance degradation is exemplified on hosts with CPU
> overcommitment. Overcommitting an arbitrarily high number of vCPUs
> enables a VM's vCPU count to easily exceed the host's available vPEIDs.
Let it be crystal clear: GICv4 and overcommitment is a non-story. It
isn't designed for that. If that's what you are trying to achieve, you
clearly didn't get the memo.
> Even with marginally more vCPUs than vPEIDs, the current all-or-nothing
> vLPI paradigm disables direct injection entirely. This creates two
> problems: first, a single many-vCPU overcommitted VM loses all direct
> injection despite having vPEIDs available;
Are you saying that your HW is so undersized that you cannot create a
*single VM* with direct injection? You really have fewer than 9 bit
worth of VPEIDs? I'm sorry, but that's laughable. Even a $200 dev
board does better.
> second, on multi-tenant
> hosts, VMs booted first consume all vPEIDs, leaving later VMs without
> direct injection regardless of their I/O intensity. Per-vCPU control
> would allow userspace to allocate available vPEIDs across VMs based on
> I/O workload rather than boot order or per-VM vCPU count. This per-vCPU
> granularity recovers most of the direct injection performance benefit
> instead of losing it completely.
>
> To allow this per-vCPU granularity, this RFC introduces three new ioctls
> to the KVM API that enables userspace the ability to activate/deactivate
> direct vLPI injection capability and resources to vCPUs ad-hoc during VM
> runtime.
How can that even work when changing the affinity of a vLPI (directly
injected) to a vcpu that doesn't have direct injection enabled? You'd
have to unmap the vLPI, and plug it back as a normal LPI. Not only
this is absolutely ridiculous from a performance perspective, but you
are also guaranteed to lose interrupts that would have fired in the
meantime. Losing interrupts in a total no-go.
Before I even look at the code, I you to explain how you are dealing
with this.
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list