[PATCH v2 18/36] KVM: arm64: gic-v5: Implement PPI interrupt injection
Jonathan Cameron
jonathan.cameron at huawei.com
Wed Jan 7 04:50:05 PST 2026
On Fri, 19 Dec 2025 15:52:42 +0000
Sascha Bischoff <Sascha.Bischoff at arm.com> wrote:
> This change introduces interrupt injection for PPIs for GICv5-based
> guests.
>
> The lifecycle of PPIs is largely managed by the hardware for a GICv5
> system. The hypervisor injects pending state into the guest by using
> the ICH_PPI_PENDRx_EL2 registers. These are used by the hardware to
> pick a Highest Priority Pending Interrupt (HPPI) for the guest based
> on the enable state of each individual interrupt. The enable state and
> priority for each interrupt are provided by the guest itself (through
> writes to the PPI registers).
>
> When Direct Virtual Interrupt (DVI) is set for a particular PPI, the
> hypervisor is even able to skip the injection of the pending state
> altogether - it all happens in hardware.
>
> The result of the above is that no AP lists are required for GICv5,
> unlike for older GICs. Instead, for PPIs the ICH_PPI_* registers
> fulfil the same purpose for all 128 PPIs. Hence, as long as the
> ICH_PPI_* registers are populated prior to guest entry, and merged
> back into the KVM shadow state on exit, the PPI state is preserved,
> and interrupts can be injected.
>
> When injecting the state of a PPI the state is merged into the KVM's
> shadow state using the set_pending_state irq_op. The directly sets the
> relevant bit in the shadow ICH_PPI_PENDRx_EL2, which is presented to
> the guest (and GICv5 hardware) on next guest entry. The
> queue_irq_unlock irq_op is required to kick the vCPU to ensure that it
> seems the new state. The result is that no AP lists are used for
> private interrupts on GICv5.
>
> Prior to entering the guest, vgic_v5_flush_ppi_state is called from
> kvm_vgic_flush_hwstate. The effectively snapshots the shadow PPI
> pending state (twice - an entry and an exit copy) in order to track
> any changes. These changes can come from a guest consuming an
> interrupt or from a guest making an Edge-triggered interrupt pending.
>
> When returning from running a guest, the guest's PPI state is merged
> back into KVM's shadow state in vgic_v5_merge_ppi_state from
> kvm_vgic_sync_hwstate. The Enable and Active state is synced back for
> all PPIs, and the pending state is synced back for Edge PPIs (Level is
> driven directly by the devices generating said levels). The incoming
> pending state from the guest is merged with KVM's shadow state to
> avoid losing any incoming interrupts.
>
> Signed-off-by: Sascha Bischoff <sascha.bischoff at arm.com>
Minor things inline
> ---
> arch/arm64/kvm/vgic/vgic-v5.c | 159 ++++++++++++++++++++++++++++++++++
> arch/arm64/kvm/vgic/vgic.c | 46 +++++++---
> arch/arm64/kvm/vgic/vgic.h | 47 ++++++++--
> include/kvm/arm_vgic.h | 3 +
> 4 files changed, 235 insertions(+), 20 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
> index 46c70dfc7bb08..cb3dd872d16a6 100644
> --- a/arch/arm64/kvm/vgic/vgic-v5.c
> +++ b/arch/arm64/kvm/vgic/vgic-v5.c
> @@ -56,6 +56,165 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
> return 0;
> }
>
> +static bool vgic_v5_ppi_set_pending_state(struct kvm_vcpu *vcpu,
> + struct vgic_irq *irq)
> +{
> + struct vgic_v5_cpu_if *cpu_if;
> + const u64 id_bit = BIT_ULL(irq->intid % 64);
Obviously the % 64 means other bits of irq->intid above the HWIRQ_ID
field don't matter, but this still seems a little odd. I'd extract
the field first, then use that for the reg and id_bit or just
do those inline where they are used.
const u32 hwirq_id = FIELD_GET(GICV5_HWIRQ_ID, irq->intid);
if (irq_is_pending(irq))
cpu_if->vgic_ppi_pendr[hwirq_id / 64] |= hwirq_id % 64;
..
Which matches style you used for similar cases in earlier patches.
> + const u32 reg = FIELD_GET(GICV5_HWIRQ_ID, irq->intid) / 64;
> +
> + if (!vcpu || !irq)
> + return false;
> +
> + /* Skip injecting the state altogether */
> + if (irq->directly_injected)
> + return true;
> +
> + cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
> +
> + if (irq_is_pending(irq))
> + cpu_if->vgic_ppi_pendr[reg] |= id_bit;
> + else
> + cpu_if->vgic_ppi_pendr[reg] &= ~id_bit;
> +
> + return true;
> +}
> +void vgic_v5_set_ppi_ops(struct vgic_irq *irq)
> +{
> + if (WARN_ON(!irq))
> + return;
> +
> + scoped_guard(raw_spinlock, &irq->irq_lock) {
Not checked on for whether code ends up outside this lock. If not
use a guard(raw_spinlock)(&irq->irq_lock);
> + if (!WARN_ON(irq->ops))
> + irq->ops = &vgic_v5_ppi_irq_ops;
> + }
> +}
> +
> +/*
> + * Detect any PPIs state changes, and propagate the state with KVM's
> + * shadow structures.
> + */
> +void vgic_v5_fold_ppi_state(struct kvm_vcpu *vcpu)
> +{
> + struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
> + int i, reg;
> +
> + for (reg = 0; reg < 2; reg++) {
It's now considered fine to declare loop variables in the loop and always
nice to limit their scope.
for (int reg = 0; reg < 2...
> + unsigned long changed_bits;
> + const unsigned long enabler = cpu_if->vgic_ich_ppi_enabler_exit[reg];
> + const unsigned long activer = cpu_if->vgic_ppi_activer_exit[reg];
> + const unsigned long pendr = cpu_if->vgic_ppi_pendr_exit[reg];
...
> +
> +void vgic_v5_flush_ppi_state(struct kvm_vcpu *vcpu)
> +{
> + struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
> +
> + /*
> + * We're about to enter the guest. Copy the shadow state to the pending
> + * reg that will be written to the ICH_PPI_PENDRx_EL2 regs. While the
> + * guest is running we track any incoming changes to the pending state in
> + * vgic_ppi_pendr. The incoming changes are merged with the outgoing
> + * changes on the return path.
> + */
> + cpu_if->vgic_ppi_pendr_entry[0] = cpu_if->vgic_ppi_pendr[0];
> + cpu_if->vgic_ppi_pendr_entry[1] = cpu_if->vgic_ppi_pendr[1];
> +
> + /*
> + * Make sure that we can correctly detect "edges" in the PPI
> + * state. There's a path where we never actually enter the guest, and
> + * failure to do this risks losing pending state
> + */
> + cpu_if->vgic_ppi_pendr_exit[0] = cpu_if->vgic_ppi_pendr[0];
> + cpu_if->vgic_ppi_pendr_exit[1] = cpu_if->vgic_ppi_pendr[1];
> +
Drop this blank line.
> +}
> diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
> index ac8cb0270e1e4..cb5d43b34462b 100644
> --- a/arch/arm64/kvm/vgic/vgic.c
> +++ b/arch/arm64/kvm/vgic/vgic.c
> @@ -258,10 +266,12 @@ struct kvm_vcpu *vgic_target_oracle(struct vgic_irq *irq)
> * If the distributor is disabled, pending interrupts shouldn't be
> * forwarded.
> */
> - if (irq->enabled && irq_is_pending(irq)) {
> - if (unlikely(irq->target_vcpu &&
> - !irq->target_vcpu->kvm->arch.vgic.enabled))
> - return NULL;
> + if (irq_is_enabled(irq) && irq_is_pending(irq)) {
> + if (irq->target_vcpu) {
Just from a readability point of view, maybe clearer to get rid of
the 'else# path for this one first.
if (!irq->target_vcpu)
return NULL;
if (!vgic_is_v5(irq->target_vcpu->kvm) &&
unlikely(!irq->target_vcpu->kvm->arch.vgic.enabled))
return NULL;
return irq->target_vcpu;
Though I see this code might go away anyway...
> + if (!vgic_is_v5(irq->target_vcpu->kvm) &&
> + unlikely(!irq->target_vcpu->kvm->arch.vgic.enabled))
> + return NULL;
> + }
>
> return irq->target_vcpu;
> }
> /* Flush our emulation state into the GIC hardware before entering the guest. */
> void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
> {
> @@ -1106,13 +1131,12 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>
> DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
>
> - scoped_guard(raw_spinlock, &vcpu->arch.vgic_cpu.ap_list_lock)
> - vgic_flush_lr_state(vcpu);
> + vgic_flush_state(vcpu);
>
> if (can_access_vgic_from_kernel())
> vgic_restore_state(vcpu);
>
> - if (vgic_supports_direct_irqs(vcpu->kvm))
> + if (vgic_supports_direct_irqs(vcpu->kvm) && !vgic_is_v5(vcpu->kvm))
This feels like a somewhat backwards check.
No function to check it vgic_is_v4? Similar cases elsewhere.
> vgic_v4_commit(vcpu);
> }
>
> diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
> index d5d9264f0a1e9..978d7f8426361 100644
> --- a/arch/arm64/kvm/vgic/vgic.h
> +++ b/arch/arm64/kvm/vgic/vgic.h
> @@ -132,6 +132,28 @@ static inline bool irq_is_pending(struct vgic_irq *irq)
> return irq->pending_latch || irq->line_level;
> }
>
> +/* Requires the irq_lock to be held by the caller. */
Can you use a lockdep notation to make that explicit?
> +static inline bool irq_is_enabled(struct vgic_irq *irq)
> +{
> + if (irq->enabled)
> + return true;
> +
> + /*
> + * We always consider GICv5 interrupts as enabled as we can
> + * always inject them. The state is handled by the hardware,
> + * and the hardware will only signal the interrupt to the
> + * guest once the guest enables it.
With my fussy reviewer hat on, that's wrapped a bit early. Go up
to 80 chars for comments.
> + */
> + if (irq->target_vcpu) {
> + u32 vgic_model = irq->target_vcpu->kvm->arch.vgic.vgic_model;
> +
> + if (vgic_model == KVM_DEV_TYPE_ARM_VGIC_V5)
> + return true;
> + }
> +
> + return false;
> +}
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 500709bd62c8d..b5180edbd1165 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -32,6 +32,9 @@
> #define VGIC_MIN_LPI 8192
> #define KVM_IRQCHIP_NUM_PINS (1020 - 32)
>
> +/* GICv5 constants */
> +#define VGIC_V5_NR_PRIVATE_IRQS 128
You have earlier checks against this value (there was one around PPI DVI setup
a few patches back). So probably better to pull the define earlier and
use it there as well?
> +
> #define is_v5_type(t, i) (FIELD_GET(GICV5_HWIRQ_TYPE, (i)) == (t))
>
> #define __irq_is_sgi(t, i) \
More information about the linux-arm-kernel
mailing list