[PATCH v4 35/49] KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive

Marc Zyngier maz at kernel.org
Tue Apr 28 13:37:55 PDT 2026


On Sun, 26 Apr 2026 15:07:36 +0100,
Marc Zyngier <maz at kernel.org> wrote:
> 
> On Sun, 26 Apr 2026 10:14:11 +0100,
> Marc Zyngier <maz at kernel.org> wrote:
> > 
> > On Wed, 22 Apr 2026 15:57:44 +0100,
> > Vishnu Pajjuri <vishnu at os.amperecomputing.com> wrote:
> > > 
> > > Hi Marc,
> > > 
> > > On 22-04-2026 12:25, Marc Zyngier wrote:
> > > >
> > > > Have you made progress on this? I can't reproduce it at all despite my
> > > > best effort. I'm perfectly happy to help, but you need to give me
> > > > *something* to go on.
> > > 
> > > 
> > > Thanks for your support!!
> > > 
> > > The issue is triggered as soon as the timer interrupt (IRQ 27) is
> > > deactivated. Preventing the deactivation of IRQ 27 during nested VGIC
> > > state transitions prevents the failure from reproducing.
> > 
> > Which level of deactivation? From L2 to L1? Or L1 to L0? The former
> > should just be a an update to the irq structure, while the latter is
> > effectively a write to ICC_DIR_EL1, and *that* is a new behaviour
> > introduced by this patch.
> > 
> > I wonder if your implementation is such that ICC_DIR_EL1 is trapped by
> > ICH_HCR_EL2.TDIR, which is allowed by the architecture, but that none
> > of the two implementations I have actually enforce (the trap only
> > applies to ICV_DIR_EL1). Can you try the hack below which disables the
> > traps much earlier, and let me know if that helps?
> > 
> > Even if that's the case, this should result in an EL2->EL2 exception,
> > and that should be caught as an unhandled exception in entry-common.c,
> > so something else is afoot.
> 
> Actually, this should never happen. ICH_HCR_EL2.TDIR is constructed
> like all the other GICv3 trap bits, in the sense that it only traps
> accesses from EL1, not EL2 (for sanity reasons, I'm not considering
> the possibility of a trap to EL3...).
> 
> Still, I'm interested in finding out if that hack helps at all.

Any update?

I've been racking my brain on this one, and I may have another
potential avenue to explore. Does your CPU implement SEIS
(ICC_CTLR_EL1.SEIS == 1)?

I think there is a corner case where we can end-up doing a double
deactivation. On its own, that's a completely harmless thing to
do. Except on an implementation with SEIS, which could deliver an
SError and route that to EL3. Add quality firmware to the mix, and you
could be in for a treat.

Obviously, implementing SEIS is a terrible and pointless thing to do
(it's been long deprecated), but in case someone felt adventurous in
the RTL department, let me know what the following hack does for you.

	M.

diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 9e841e7afd4a7..c92ca6969e85f 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -275,7 +275,7 @@ void vgic_v3_deactivate(struct kvm_vcpu *vcpu, u64 val)
 		lr = vgic_v3_compute_lr(vcpu, irq) & ~ICH_LR_ACTIVE_BIT;
 	}
 
-	if (lr & ICH_LR_HW)
+	if ((lr & ICH_LR_HW) && !vgic_state_is_nested(vcpu))
 		vgic_v3_deactivate_phys(FIELD_GET(ICH_LR_PHYS_ID_MASK, lr));
 
 	vgic_v3_fold_lr(vcpu, lr);

-- 
Jazz isn't dead. It just smells funny.



More information about the linux-arm-kernel mailing list