[PATCH v6 04/39] KVM: arm64: vgic: Split out mapping IRQs and setting irq_ops
Sascha Bischoff
Sascha.Bischoff at arm.com
Wed Mar 18 10:30:12 PDT 2026
On Tue, 2026-03-17 at 16:00 +0000, Marc Zyngier wrote:
> On Tue, 17 Mar 2026 11:40:59 +0000,
> Sascha Bischoff <Sascha.Bischoff at arm.com> wrote:
> >
> > Prior to this change, the act of mapping a virtual IRQ to a
> > physical
> > one also set the irq_ops. Unmapping then reset the irq_ops to NULL.
> > So
> > far, this has been fine and hasn't caused any major issues.
> >
> > Now, however, as GICv5 support is being added to KVM, it has become
> > apparent that conflating mapping/unmapping IRQs and
> > setting/clearing
> > irq_ops can cause issues. The reason is that the upcoming GICv5
> > support introduces a set of default irq_ops for PPIs, and removing
> > this when unmapping will cause things to break rather horribly.
> >
> > Split out the mapping/unmapping of IRQs from the setting/clearing
> > of
> > irq_ops. The arch timer code is updated to set the irq_ops
> > following a
> > successful map. The irq_ops are intentionally not removed again on
> > an
> > unmap as the only irq_op introduced by the arch timer only takes
> > effect if the hw bit in struct vgic_irq is set. Therefore, it is
> > safe
> > to leave this in place, and it avoids additional complexity when
> > GICv5
> > support is introduced.
> >
> > Signed-off-by: Sascha Bischoff <sascha.bischoff at arm.com>
> > ---
> > arch/arm64/kvm/arch_timer.c | 32 ++++++++++++++++++-------------
> > arch/arm64/kvm/vgic/vgic.c | 38 +++++++++++++++++++++++++++++++--
> > ----
> > include/kvm/arm_vgic.h | 5 ++++-
> > 3 files changed, 55 insertions(+), 20 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/arch_timer.c
> > b/arch/arm64/kvm/arch_timer.c
> > index 600f250753b45..1f536dd5978d4 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -740,14 +740,17 @@ static void
> > kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
> >
> > ret = kvm_vgic_map_phys_irq(vcpu,
> > map->direct_vtimer-
> > >host_timer_irq,
> > - timer_irq(map-
> > >direct_vtimer),
> > - &arch_timer_irq_ops);
> > - WARN_ON_ONCE(ret);
> > + timer_irq(map-
> > >direct_vtimer));
> > + if (!WARN_ON_ONCE(ret))
> > + kvm_vgic_set_irq_ops(vcpu, timer_irq(map-
> > >direct_vtimer),
> > + &arch_timer_irq_ops);
> > +
> > ret = kvm_vgic_map_phys_irq(vcpu,
> > map->direct_ptimer-
> > >host_timer_irq,
> > - timer_irq(map-
> > >direct_ptimer),
> > - &arch_timer_irq_ops);
> > - WARN_ON_ONCE(ret);
> > + timer_irq(map-
> > >direct_ptimer));
> > + if (!WARN_ON_ONCE(ret))
> > + kvm_vgic_set_irq_ops(vcpu, timer_irq(map-
> > >direct_ptimer),
> > + &arch_timer_irq_ops);
>
> Do we really need this eager setting of ops? Given that nothing seems
> to clear them, why can't we just set the ops at vcpu init time? Given
> that this is a pretty hot path (on each exception/exception return
> between L2 and L1), the least we do here, the better.
Hmm, I think you're right. When making this change, I was trying to
preserve the existing behaviour so set the irq_ops for each map call.
However, as you say nothing is clearing the ops (as things stand, at
least), so that does indeed make sense to do IMO.
>
> > }
> > }
> >
> > @@ -1565,20 +1568,23 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
> >
> > ret = kvm_vgic_map_phys_irq(vcpu,
> > map.direct_vtimer-
> > >host_timer_irq,
> > - timer_irq(map.direct_vtimer),
> > - &arch_timer_irq_ops);
> > + timer_irq(map.direct_vtimer));
> > if (ret)
> > return ret;
> >
> > + kvm_vgic_set_irq_ops(vcpu, timer_irq(map.direct_vtimer),
> > + &arch_timer_irq_ops);
> > +
> > if (map.direct_ptimer) {
> > ret = kvm_vgic_map_phys_irq(vcpu,
> > map.direct_ptimer-
> > >host_timer_irq,
> > -
> > timer_irq(map.direct_ptimer),
> > - &arch_timer_irq_ops);
> > - }
> > +
> > timer_irq(map.direct_ptimer));
> > + if (ret)
> > + return ret;
> >
> > - if (ret)
> > - return ret;
> > + kvm_vgic_set_irq_ops(vcpu,
> > timer_irq(map.direct_ptimer),
> > + &arch_timer_irq_ops);
> > + }
>
> which would mean moving this to kvm_timer_vcpu_init().
This, however, is not quite that simple.
It turns out that we actually call kvm_timer_vcpu_init() before
kvm_vgic_vcpu_init() from kvm_arch_vcpu_create(), meaning that we don't
have the private IRQs allocated.
Is there a good reason that the timer (and PMU) are initialised prior
to initialising the CPU?
I've tried making this chang, and once I reorder the timer and vcpu
initialisation I can confirm that things work with and without nested.
>
> >
> > no_vgic:
> > timer->enabled = 1;
> > diff --git a/arch/arm64/kvm/vgic/vgic.c
> > b/arch/arm64/kvm/vgic/vgic.c
> > index e22b79cfff965..e37c640d74bcf 100644
> > --- a/arch/arm64/kvm/vgic/vgic.c
> > +++ b/arch/arm64/kvm/vgic/vgic.c
> > @@ -553,10 +553,38 @@ int kvm_vgic_inject_irq(struct kvm *kvm,
> > struct kvm_vcpu *vcpu,
> > return 0;
> > }
> >
> > +void kvm_vgic_set_irq_ops(struct kvm_vcpu *vcpu, u32 vintid,
> > + struct irq_ops *ops)
> > +{
> > + struct vgic_irq *irq = vgic_get_vcpu_irq(vcpu, vintid);
> > +
> > + BUG_ON(!irq);
> > +
> > + scoped_guard(raw_spinlock_irqsave, &irq->irq_lock)
> > + {
> > + irq->ops = ops;
> > + }
>
> nit: opening brace in the wrong spot, and overall not useful. This
> could simply be written as:
>
> scoped_guard(raw_spinlock_irqsave, &irq->irq_lock)
> irq->ops = ops;
Argh, sorry that slipped through!
>
> > +
> > + vgic_put_irq(vcpu->kvm, irq);
> > +}
> > +
> > +void kvm_vgic_clear_irq_ops(struct kvm_vcpu *vcpu, u32 vintid)
> > +{
> > + struct vgic_irq *irq = vgic_get_vcpu_irq(vcpu, vintid);
> > +
> > + BUG_ON(!irq);
> > +
> > + scoped_guard(raw_spinlock_irqsave, &irq->irq_lock)
> > + {
> > + irq->ops = NULL;
> > + }
> > +
> > + vgic_put_irq(vcpu->kvm, irq);
> > +}
> > +
>
> nit: that could also be written as:
>
> void kvm_vgic_clear_irq_ops(struct kvm_vcpu *vcpu, u32 vintid)
> {
> kvm_vgic_set_irq_ops(vcpu, vintid, NULL);
> }
Ah, that is indeed cleaner.
>
> I can fix all of it when applying if that works for you.
If you're happy to do that, that is great! Do note what I said above
regarding the order of vcpu and timer init!
Thanks,
Sascha
>
> Thanks,
>
> M.
>
More information about the linux-arm-kernel
mailing list