[PATCH v2 18/36] KVM: arm64: gic-v5: Implement PPI interrupt injection
Sascha Bischoff
Sascha.Bischoff at arm.com
Thu Jan 8 06:43:32 PST 2026
On Wed, 2026-01-07 at 12:50 +0000, Jonathan Cameron wrote:
> On Fri, 19 Dec 2025 15:52:42 +0000
> Sascha Bischoff <Sascha.Bischoff at arm.com> wrote:
>
> > This change introduces interrupt injection for PPIs for GICv5-based
> > guests.
> >
> > The lifecycle of PPIs is largely managed by the hardware for a
> > GICv5
> > system. The hypervisor injects pending state into the guest by
> > using
> > the ICH_PPI_PENDRx_EL2 registers. These are used by the hardware to
> > pick a Highest Priority Pending Interrupt (HPPI) for the guest
> > based
> > on the enable state of each individual interrupt. The enable state
> > and
> > priority for each interrupt are provided by the guest itself
> > (through
> > writes to the PPI registers).
> >
> > When Direct Virtual Interrupt (DVI) is set for a particular PPI,
> > the
> > hypervisor is even able to skip the injection of the pending state
> > altogether - it all happens in hardware.
> >
> > The result of the above is that no AP lists are required for GICv5,
> > unlike for older GICs. Instead, for PPIs the ICH_PPI_* registers
> > fulfil the same purpose for all 128 PPIs. Hence, as long as the
> > ICH_PPI_* registers are populated prior to guest entry, and merged
> > back into the KVM shadow state on exit, the PPI state is preserved,
> > and interrupts can be injected.
> >
> > When injecting the state of a PPI the state is merged into the
> > KVM's
> > shadow state using the set_pending_state irq_op. The directly sets
> > the
> > relevant bit in the shadow ICH_PPI_PENDRx_EL2, which is presented
> > to
> > the guest (and GICv5 hardware) on next guest entry. The
> > queue_irq_unlock irq_op is required to kick the vCPU to ensure that
> > it
> > seems the new state. The result is that no AP lists are used for
> > private interrupts on GICv5.
> >
> > Prior to entering the guest, vgic_v5_flush_ppi_state is called from
> > kvm_vgic_flush_hwstate. The effectively snapshots the shadow PPI
> > pending state (twice - an entry and an exit copy) in order to track
> > any changes. These changes can come from a guest consuming an
> > interrupt or from a guest making an Edge-triggered interrupt
> > pending.
> >
> > When returning from running a guest, the guest's PPI state is
> > merged
> > back into KVM's shadow state in vgic_v5_merge_ppi_state from
> > kvm_vgic_sync_hwstate. The Enable and Active state is synced back
> > for
> > all PPIs, and the pending state is synced back for Edge PPIs (Level
> > is
> > driven directly by the devices generating said levels). The
> > incoming
> > pending state from the guest is merged with KVM's shadow state to
> > avoid losing any incoming interrupts.
> >
> > Signed-off-by: Sascha Bischoff <sascha.bischoff at arm.com>
> Minor things inline
>
> > ---
> > arch/arm64/kvm/vgic/vgic-v5.c | 159
> > ++++++++++++++++++++++++++++++++++
> > arch/arm64/kvm/vgic/vgic.c | 46 +++++++---
> > arch/arm64/kvm/vgic/vgic.h | 47 ++++++++--
> > include/kvm/arm_vgic.h | 3 +
> > 4 files changed, 235 insertions(+), 20 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic-v5.c
> > b/arch/arm64/kvm/vgic/vgic-v5.c
> > index 46c70dfc7bb08..cb3dd872d16a6 100644
> > --- a/arch/arm64/kvm/vgic/vgic-v5.c
> > +++ b/arch/arm64/kvm/vgic/vgic-v5.c
> > @@ -56,6 +56,165 @@ int vgic_v5_probe(const struct gic_kvm_info
> > *info)
> > return 0;
> > }
> >
> > +static bool vgic_v5_ppi_set_pending_state(struct kvm_vcpu *vcpu,
> > + struct vgic_irq *irq)
> > +{
> > + struct vgic_v5_cpu_if *cpu_if;
> > + const u64 id_bit = BIT_ULL(irq->intid % 64);
>
> Obviously the % 64 means other bits of irq->intid above the HWIRQ_ID
> field don't matter, but this still seems a little odd. I'd extract
> the field first, then use that for the reg and id_bit or just
> do those inline where they are used.
>
> const u32 hwirq_id = FIELD_GET(GICV5_HWIRQ_ID, irq->intid);
>
> if (irq_is_pending(irq))
> cpu_if->vgic_ppi_pendr[hwirq_id / 64] |= hwirq_id %
> 64;
> ..
>
> Which matches style you used for similar cases in earlier patches.
Done.
>
> > + const u32 reg = FIELD_GET(GICV5_HWIRQ_ID, irq->intid) /
> > 64;
> > +
> > + if (!vcpu || !irq)
> > + return false;
> > +
> > + /* Skip injecting the state altogether */
> > + if (irq->directly_injected)
> > + return true;
> > +
> > + cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
> > +
> > + if (irq_is_pending(irq))
> > + cpu_if->vgic_ppi_pendr[reg] |= id_bit;
> > + else
> > + cpu_if->vgic_ppi_pendr[reg] &= ~id_bit;
> > +
> > + return true;
> > +}
>
>
> > +void vgic_v5_set_ppi_ops(struct vgic_irq *irq)
> > +{
> > + if (WARN_ON(!irq))
> > + return;
> > +
> > + scoped_guard(raw_spinlock, &irq->irq_lock) {
> Not checked on for whether code ends up outside this lock. If not
> use a guard(raw_spinlock)(&irq->irq_lock);
Have used guard() instead.
>
> > + if (!WARN_ON(irq->ops))
> > + irq->ops = &vgic_v5_ppi_irq_ops;
> > + }
> > +}
> > +
> > +/*
> > + * Detect any PPIs state changes, and propagate the state with
> > KVM's
> > + * shadow structures.
> > + */
> > +void vgic_v5_fold_ppi_state(struct kvm_vcpu *vcpu)
> > +{
> > + struct vgic_v5_cpu_if *cpu_if = &vcpu-
> > >arch.vgic_cpu.vgic_v5;
> > + int i, reg;
> > +
> > + for (reg = 0; reg < 2; reg++) {
> It's now considered fine to declare loop variables in the loop and
> always
> nice to limit their scope.
>
> for (int reg = 0; reg < 2...
Done.
>
> > + unsigned long changed_bits;
> > + const unsigned long enabler = cpu_if-
> > >vgic_ich_ppi_enabler_exit[reg];
> > + const unsigned long activer = cpu_if-
> > >vgic_ppi_activer_exit[reg];
> > + const unsigned long pendr = cpu_if-
> > >vgic_ppi_pendr_exit[reg];
>
> ...
>
> > +
> > +void vgic_v5_flush_ppi_state(struct kvm_vcpu *vcpu)
> > +{
> > + struct vgic_v5_cpu_if *cpu_if = &vcpu-
> > >arch.vgic_cpu.vgic_v5;
> > +
> > + /*
> > + * We're about to enter the guest. Copy the shadow state
> > to the pending
> > + * reg that will be written to the ICH_PPI_PENDRx_EL2
> > regs. While the
> > + * guest is running we track any incoming changes to the
> > pending state in
> > + * vgic_ppi_pendr. The incoming changes are merged with
> > the outgoing
> > + * changes on the return path.
> > + */
> > + cpu_if->vgic_ppi_pendr_entry[0] = cpu_if-
> > >vgic_ppi_pendr[0];
> > + cpu_if->vgic_ppi_pendr_entry[1] = cpu_if-
> > >vgic_ppi_pendr[1];
> > +
> > + /*
> > + * Make sure that we can correctly detect "edges" in the
> > PPI
> > + * state. There's a path where we never actually enter the
> > guest, and
> > + * failure to do this risks losing pending state
> > + */
> > + cpu_if->vgic_ppi_pendr_exit[0] = cpu_if-
> > >vgic_ppi_pendr[0];
> > + cpu_if->vgic_ppi_pendr_exit[1] = cpu_if-
> > >vgic_ppi_pendr[1];
> > +
> Drop this blank line.
Done.
>
> > +}
>
> > diff --git a/arch/arm64/kvm/vgic/vgic.c
> > b/arch/arm64/kvm/vgic/vgic.c
> > index ac8cb0270e1e4..cb5d43b34462b 100644
> > --- a/arch/arm64/kvm/vgic/vgic.c
> > +++ b/arch/arm64/kvm/vgic/vgic.c
>
> > @@ -258,10 +266,12 @@ struct kvm_vcpu *vgic_target_oracle(struct
> > vgic_irq *irq)
> > * If the distributor is disabled, pending interrupts
> > shouldn't be
> > * forwarded.
> > */
> > - if (irq->enabled && irq_is_pending(irq)) {
> > - if (unlikely(irq->target_vcpu &&
> > - !irq->target_vcpu->kvm-
> > >arch.vgic.enabled))
> > - return NULL;
> > + if (irq_is_enabled(irq) && irq_is_pending(irq)) {
> > + if (irq->target_vcpu) {
>
> Just from a readability point of view, maybe clearer to get rid of
> the 'else# path for this one first.
>
> if (!irq->target_vcpu)
> return NULL;
>
> if (!vgic_is_v5(irq->target_vcpu->kvm) &&
> unlikely(!irq->target_vcpu->kvm-
> >arch.vgic.enabled))
> return NULL;
>
> return irq->target_vcpu;
>
> Though I see this code might go away anyway...
Yeah, this has now gone away.
>
> > + if (!vgic_is_v5(irq->target_vcpu->kvm) &&
> > + unlikely(!irq->target_vcpu->kvm-
> > >arch.vgic.enabled))
> > + return NULL;
> > + }
> >
> > return irq->target_vcpu;
> > }
>
>
>
> > /* Flush our emulation state into the GIC hardware before entering
> > the guest. */
> > void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
> > {
> > @@ -1106,13 +1131,12 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu
> > *vcpu)
> >
> > DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
> >
> > - scoped_guard(raw_spinlock, &vcpu-
> > >arch.vgic_cpu.ap_list_lock)
> > - vgic_flush_lr_state(vcpu);
> > + vgic_flush_state(vcpu);
> >
> > if (can_access_vgic_from_kernel())
> > vgic_restore_state(vcpu);
> >
> > - if (vgic_supports_direct_irqs(vcpu->kvm))
> > + if (vgic_supports_direct_irqs(vcpu->kvm) &&
> > !vgic_is_v5(vcpu->kvm))
>
> This feels like a somewhat backwards check.
> No function to check it vgic_is_v4? Similar cases elsewhere.
There is not, because KVM doesn't support a vGICv4. One can create a
vGICv3-based guest, and if the underlying hardware is GICv4.x, then the
additional GICv4 features are used - such as the direct injection of
IRQs.
Instead of checking for !v5 I think it is cleaner to instead check
kvm_vgic_global_state.has_gicv4, which tells us that the host GIC is
v4. I'll try and update the relevant locations.
>
> > vgic_v4_commit(vcpu);
> > }
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic.h
> > b/arch/arm64/kvm/vgic/vgic.h
> > index d5d9264f0a1e9..978d7f8426361 100644
> > --- a/arch/arm64/kvm/vgic/vgic.h
> > +++ b/arch/arm64/kvm/vgic/vgic.h
> > @@ -132,6 +132,28 @@ static inline bool irq_is_pending(struct
> > vgic_irq *irq)
> > return irq->pending_latch || irq->line_level;
> > }
> >
> > +/* Requires the irq_lock to be held by the caller. */
>
> Can you use a lockdep notation to make that explicit?
I have actually dropped this as it didn't add anything useful once the
oracle code was cleaned up.
>
> > +static inline bool irq_is_enabled(struct vgic_irq *irq)
> > +{
> > + if (irq->enabled)
> > + return true;
> > +
> > + /*
> > + * We always consider GICv5 interrupts as enabled as we
> > can
> > + * always inject them. The state is handled by the
> > hardware,
> > + * and the hardware will only signal the interrupt to the
> > + * guest once the guest enables it.
>
> With my fussy reviewer hat on, that's wrapped a bit early. Go up
> to 80 chars for comments.
>
> > + */
> > + if (irq->target_vcpu) {
> > + u32 vgic_model = irq->target_vcpu->kvm-
> > >arch.vgic.vgic_model;
> > +
> > + if (vgic_model == KVM_DEV_TYPE_ARM_VGIC_V5)
> > + return true;
> > + }
> > +
> > + return false;
> > +}
>
> > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> > index 500709bd62c8d..b5180edbd1165 100644
> > --- a/include/kvm/arm_vgic.h
> > +++ b/include/kvm/arm_vgic.h
> > @@ -32,6 +32,9 @@
> > #define VGIC_MIN_LPI 8192
> > #define KVM_IRQCHIP_NUM_PINS (1020 - 32)
> >
> > +/* GICv5 constants */
> > +#define VGIC_V5_NR_PRIVATE_IRQS 128
>
> You have earlier checks against this value (there was one around PPI
> DVI setup
> a few patches back). So probably better to pull the define earlier
> and
> use it there as well?
Good shout. I've actually re-worked the DVI one a little, but will
check if it makes sense to move this to one of the earlier commits or
not.
>
> > +
> > #define is_v5_type(t, i) (FIELD_GET(GICV5_HWIRQ_TYPE, (i))
> > == (t))
> >
> > #define __irq_is_sgi(t,
> > i) \
>
Thanks as always,
Sascha
More information about the linux-arm-kernel
mailing list