[PATCH v6 29/39] KVM: arm64: gic-v5: Enlighten arch timer for GICv5

Thu Mar 19 01:59:40 PDT 2026

On Tue, 2026-03-17 at 18:05 +0000, Marc Zyngier wrote:
> On Tue, 17 Mar 2026 11:47:29 +0000,
> Sascha Bischoff <Sascha.Bischoff at arm.com> wrote:
> > 
> > Now that GICv5 has arrived, the arch timer requires some TLC to
> > address some of the key differences introduced with GICv5.
> > 
> > For PPIs on GICv5, the queue_irq_unlock irq_op is used as AP lists
> > are
> > not required at all for GICv5. The arch timer also introduces an
> > irq_op - get_input_level. Extend the arch-timer-provided irq_ops to
> > include the PPI op for vgic_v5 guests.
> > 
> > When possible, DVI (Direct Virtual Interrupt) is set for PPIs when
> > using a vgic_v5, which directly inject the pending state into the
> > guest. This means that the host never sees the interrupt for the
> > guest
> > for these interrupts. This has three impacts.
> > 
> > * First of all, the kvm_cpu_has_pending_timer check is updated to
> >   explicitly check if the timers are expected to fire.
> > 
> > * Secondly, for mapped timers (which use DVI) they must be masked
> > on
> >   the host prior to entering a GICv5 guest, and unmasked on the
> > return
> >   path. This is handled in set_timer_irq_phys_masked.
> > 
> > * Thirdly, it makes zero sense to attempt to inject state for a
> > DVI'd
> >   interrupt. Track which timers are direct, and skip the call to
> >   kvm_vgic_inject_irq() for these.
> > 
> > The final, but rather important, change is that the architected
> > PPIs
> > for the timers are made mandatory for a GICv5 guest. Attempts to
> > set
> > them to anything else are actively rejected. Once a vgic_v5 is
> > initialised, the arch timer PPIs are also explicitly reinitialised
> > to
> > ensure the correct GICv5-compatible PPIs are used - this also adds
> > in
> > the GICv5 PPI type to the intid.
> > 
> > Signed-off-by: Sascha Bischoff <sascha.bischoff at arm.com>
> > Reviewed-by: Jonathan Cameron <jonathan.cameron at huawei.com>
> > ---
> >  arch/arm64/kvm/arch_timer.c     | 110 ++++++++++++++++++++++++++--
> > ----
> >  arch/arm64/kvm/vgic/vgic-init.c |   9 +++
> >  arch/arm64/kvm/vgic/vgic-v5.c   |   7 +-
> >  include/kvm/arm_arch_timer.h    |  11 +++-
> >  include/kvm/arm_vgic.h          |   3 +
> >  5 files changed, 115 insertions(+), 25 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/arch_timer.c
> > b/arch/arm64/kvm/arch_timer.c
> > index 53312b88c342d..4575c36cae537 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -56,6 +56,12 @@ static struct irq_ops arch_timer_irq_ops = {
> >  	.get_input_level = kvm_arch_timer_get_input_level,
> >  };
> >  
> > +static struct irq_ops arch_timer_irq_ops_vgic_v5 = {
> > +	.get_input_level = kvm_arch_timer_get_input_level,
> > +	.queue_irq_unlock = vgic_v5_ppi_queue_irq_unlock,
> > +	.set_direct_injection = vgic_v5_set_ppi_dvi,
> > +};
> > +
> >  static int nr_timers(struct kvm_vcpu *vcpu)
> >  {
> >  	if (!vcpu_has_nv(vcpu))
> > @@ -177,6 +183,10 @@ void get_timer_map(struct kvm_vcpu *vcpu,
> > struct timer_map *map)
> >  		map->emul_ptimer = vcpu_ptimer(vcpu);
> >  	}
> >  
> > +	map->direct_vtimer->direct = true;
> > +	if (map->direct_ptimer)
> > +		map->direct_ptimer->direct = true;
> > +
> >  	trace_kvm_get_timer_map(vcpu->vcpu_id, map);
> >  }
> >  
> > @@ -396,7 +406,11 @@ static bool kvm_timer_should_fire(struct
> > arch_timer_context *timer_ctx)
> >  
> >  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
> >  {
> > -	return vcpu_has_wfit_active(vcpu) && wfit_delay_ns(vcpu)
> > == 0;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> > +
> > +	return kvm_timer_should_fire(vtimer) ||
> > kvm_timer_should_fire(ptimer) ||
> > +	       (vcpu_has_wfit_active(vcpu) && wfit_delay_ns(vcpu)
> > == 0);
> >  }
> >  
> >  /*
> > @@ -447,6 +461,10 @@ static void kvm_timer_update_irq(struct
> > kvm_vcpu *vcpu, bool new_level,
> >  	if (userspace_irqchip(vcpu->kvm))
> >  		return;
> >  
> > +	/* Skip injecting on GICv5 for directly injected (DVI'd)
> > timers */
> > +	if (vgic_is_v5(vcpu->kvm) && timer_ctx->direct)
> > +		return;
> > +
> >  	kvm_vgic_inject_irq(vcpu->kvm, vcpu,
> >  			    timer_irq(timer_ctx),
> >  			    timer_ctx->irq.level,
> > @@ -657,6 +675,24 @@ static inline void
> > set_timer_irq_phys_active(struct arch_timer_context *ctx, boo
> >  	WARN_ON(r);
> >  }
> >  
> > +/*
> > + * On GICv5 we use DVI for the arch timer PPIs. This is restored
> > later
> > + * on as part of vgic_load. Therefore, in order to avoid the
> > guest's
> > + * interrupt making it to the host we mask it before entering the
> > + * guest and unmask it again when we return.
> > + */
> > +static inline void set_timer_irq_phys_masked(struct
> > arch_timer_context *ctx, bool masked)
> > +{
> > +	if (masked) {
> > +		disable_percpu_irq(ctx->host_timer_irq);
> > +	} else {
> > +		if (ctx->host_timer_irq == host_vtimer_irq)
> > +			enable_percpu_irq(ctx->host_timer_irq,
> > host_vtimer_irq_flags);
> > +		else
> > +			enable_percpu_irq(ctx->host_timer_irq,
> > host_ptimer_irq_flags);
> > +	}
> > +}
> 
> I think this is missing a trick, which is to reuse the mask/unmask
> infrastructure we use for the fruity crap. How about this following
> untested hack?
> 
> diff --git a/arch/arm64/kvm/arch_timer.c
> b/arch/arm64/kvm/arch_timer.c
> index 600f250753b45..b29bea800e2ab 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -660,7 +660,7 @@ static inline void
> set_timer_irq_phys_active(struct arch_timer_context *ctx, boo
>  static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
>  {
>  	struct kvm_vcpu *vcpu = timer_context_to_vcpu(ctx);
> -	bool phys_active = false;
> +	bool phys_active = vgic_is_v5(vcpu->kvm);

Note: This needs to be or'd in later as it gets overwritten by
kvm_vgic_map_is_active().

>  
>  	/*
>  	 * Update the timer output so that it is likely to match the
> @@ -934,6 +934,12 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  
>  	if (kvm_vcpu_is_blocking(vcpu))
>  		kvm_timer_blocking(vcpu);
> +
> +	if (vgic_is_v5(vcpu)) {
> +		set_timer_irq_phys_active(map.direct_vtimer, false);
> +		if (map.direct_ptimer)
> +			set_timer_irq_phys_active(map.direct_ptimer,
> false);
> +	}
>  }
>  
>  void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
> @@ -1333,7 +1339,8 @@ static int kvm_irq_init(struct
> arch_timer_kvm_info *info)
>  	host_vtimer_irq = info->virtual_irq;
>  	kvm_irq_fixup_flags(host_vtimer_irq,
> &host_vtimer_irq_flags);
>  
> -	if (kvm_vgic_global_state.no_hw_deactivation) {
> +	if (kvm_vgic_global_state.no_hw_deactivation ||
> +	    kvm_vgic_global_state.type == VGIC_V5) {
>  		struct fwnode_handle *fwnode;
>  		struct irq_data *data;
>  
> @@ -1351,7 +1358,8 @@ static int kvm_irq_init(struct
> arch_timer_kvm_info *info)
>  			return -ENOMEM;
>  		}
>  
> -		arch_timer_irq_ops.flags |= VGIC_IRQ_SW_RESAMPLE;
> +		if (kvm_vgic_global_state.no_hw_deactivation)
> +			arch_timer_irq_ops.flags |=
> VGIC_IRQ_SW_RESAMPLE;
>  		WARN_ON(irq_domain_push_irq(domain, host_vtimer_irq,
>  					    (void *)TIMER_VTIMER));
>  	}
> 
> which should avoid adding some new masking stuff.

Thanks for this, Marc. I've given it a go, and have eventually been
able to make it work. Things were, as they always are, a little more
complex.

First of all, the GICv5 irqchip driver doesn't register a
irq_set_type() handler for PPIs as those do not have a configurable
handling/trigger mode. I believe we originally had this in the
prototyping, but given that all it could do is to check that the
hardware matched whatever firmware said, it was dropped as part of
upstreaming. irq_set_type() is marked as optional in the genericirq
documentation, so this seemed like a fine thing to do. 

However, as it turns out things fall over if one layers a domain on top
of a domain that doesn't implement irq_set_type() and calls
request_percpu_irq(). Somewhere in the depths of that,
__irq_set_trigger() is called, which returns -ENOSYS if the parent
domain doesn't have irq_set_type() populated.

This means that without having a irq_set_type() in the GICv5 irqchip
driver, we bail out in kvm_timer_hyp_init() with your above change.

I'm not sure if this is a deficiency in the GICv5 irqchip driver, or if
it is one in the irqchip subsystem itself. As I said, the function is
marked as optional in the documentation (Documentation/core-
api/genericirq.rst), and this suggests to me that it isn't in the case
where one has a domain hierarchy rather than a single flat domain.

I worked around this with:

diff --git a/drivers/irqchip/irq-gic-v5.c b/drivers/irqchip/irq-gic-
v5.c
index 405a5eee847b6..6b0903be8ebfd 100644
--- a/drivers/irqchip/irq-gic-v5.c
+++ b/drivers/irqchip/irq-gic-v5.c
@@ -511,6 +511,23 @@ static bool gicv5_ppi_irq_is_level(irq_hw_number_t
hwirq)
        return !!(read_ppi_sysreg_s(hwirq, PPI_HM) & bit);
 }
 
+static int gicv5_ppi_irq_set_type(struct irq_data *d, unsigned int
type)
+{
+       /*
+        * GICv5's PPIs do not have a configurable trigger or handling
+        * mode. Check that the attempt to set a type matches what the
+        * hardware reports in the HMR, and error on a mismatch.
+        */
+
+       if (type & IRQ_TYPE_EDGE_BOTH && gicv5_ppi_irq_is_level(d-
>hwirq))
+               return -EINVAL;
+
+       if (type & IRQ_TYPE_LEVEL_MASK && !gicv5_ppi_irq_is_level(d-
>hwirq))
+               return -EINVAL;
+
+       return 0;
+}
+
 static int gicv5_ppi_irq_set_vcpu_affinity(struct irq_data *d, void
*vcpu)
 {
        if (vcpu)
@@ -526,6 +543,7 @@ static const struct irq_chip gicv5_ppi_irq_chip = {
        .irq_mask               = gicv5_ppi_irq_mask,
        .irq_unmask             = gicv5_ppi_irq_unmask,
        .irq_eoi                = gicv5_ppi_irq_eoi,
+       .irq_set_type           = gicv5_ppi_irq_set_type,
        .irq_get_irqchip_state  = gicv5_ppi_irq_get_irqchip_state,
        .irq_set_irqchip_state  = gicv5_ppi_irq_set_irqchip_state,
        .irq_set_vcpu_affinity  = gicv5_ppi_irq_set_vcpu_affinity,

It is noddy, but it "fixes" the issue when requesting an irq.

The next issue is around EOIing. When running GICv3 guests that make
use of the HW bit in the LRs and hence rely on hw deactivation on a
GICv5 host we handle this in the host irqchip driver. Specifically, we
do the following for PPIs:

static void gicv5_ppi_irq_eoi(struct irq_data *d)
{
        /* Skip deactivate for forwarded PPI interrupts */
        if (irqd_is_forwarded_to_vcpu(d)) {
                gic_insn(0, CDEOI);
                return;
        }

        gicv5_hwirq_eoi(d->hwirq, GICV5_HWIRQ_TYPE_PPI);
}

The arch_timer irqchip's EOI as it currently stands completely skips
the EOI callback for forwarded irqs. This doesn't work for GICv3 guests
on GICv5 as that means they never get EOI'd as the we emulate that in
software. Therefore, one needs to explicitly catch that case, and call
the host irqchip driver's EOI on GICv5 hosts:

 static void timer_irq_eoi(struct irq_data *d)
 {
-       if (!irqd_is_forwarded_to_vcpu(d))
+       /*
+        * On a GICv5 host, we still need to call EOI on the parent for
+        * PPIs. The host driver already handles irqs which are forwarded to
+        * vcpus, and skips the GIC CDDI while still doing the GIC CDEOI. This
+        * is required to emulate the EOIMode=1 on GICv5 hardware. Failure to
+        * call EOI unsurprisingly results in *BAD* lock-ups.
+        */
+       if (!irqd_is_forwarded_to_vcpu(d) ||
+           kvm_vgic_global_state.type == VGIC_V5)
                irq_chip_eoi_parent(d);
 }

In the end after making these changes, I've been able to get this
working for the arch_timer code, and can completely remove the bespoke
GICv5 masking.

Thanks,
Sascha

> 
> Thanks,
> 
> 	M.
>