[PATCH v3 3/5] KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation

Fuad Tabba tabba at google.com
Mon Nov 17 03:37:30 PST 2025


On Mon, 17 Nov 2025 at 11:34, Marc Zyngier <maz at kernel.org> wrote:
>
> On Mon, 17 Nov 2025 11:24:24 +0000,
> Fuad Tabba <tabba at google.com> wrote:
> >
> > Hi Marc,
> >
> >
> > On Mon, 17 Nov 2025 at 09:15, Marc Zyngier <maz at kernel.org> wrote:
> > >
> > > The current approach to nested GICv3 support is to not do anything
> > > while L2 is running, wait a transition from L2 to L1 to resync
> > > LRs, VMCR and HCR, and only then evaluate the state to decide
> > > whether to generate a maintenance interrupt.
> > >
> > > This doesn't provide a good quality of emulation, and it would be
> > > far preferable to find out early that we need to perform a switch.
> > >
> > > Move the LRs/VMCR and HCR resync into vgic_v3_sync_nested(), so
> > > that we have most of the state available. As we turning the vgic
> > > off at this stage to avoid a screaming host MI, add a new helper
> > > vgic_v3_flush_nested() that switches the vgic on again. The MI can
> > > then be directly injected as required.
> > >
> > > Signed-off-by: Marc Zyngier <maz at kernel.org>
> > > ---
> > >  arch/arm64/include/asm/kvm_hyp.h     |  1 +
> > >  arch/arm64/kvm/hyp/vgic-v3-sr.c      |  2 +-
> > >  arch/arm64/kvm/vgic/vgic-v3-nested.c | 69 ++++++++++++++++------------
> > >  arch/arm64/kvm/vgic/vgic.c           |  6 ++-
> > >  arch/arm64/kvm/vgic/vgic.h           |  1 +
> > >  5 files changed, 46 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> > > index dbf16a9f67728..76ce2b94bd97e 100644
> > > --- a/arch/arm64/include/asm/kvm_hyp.h
> > > +++ b/arch/arm64/include/asm/kvm_hyp.h
> > > @@ -77,6 +77,7 @@ DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> > >  int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
> > >
> > >  u64 __gic_v3_get_lr(unsigned int lr);
> > > +void __gic_v3_set_lr(u64 val, int lr);
> > >
> > >  void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
> > >  void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
> > > diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> > > index 71199e1a92940..99342c13e1794 100644
> > > --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
> > > +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> > > @@ -60,7 +60,7 @@ u64 __gic_v3_get_lr(unsigned int lr)
> > >         unreachable();
> > >  }
> > >
> > > -static void __gic_v3_set_lr(u64 val, int lr)
> > > +void __gic_v3_set_lr(u64 val, int lr)
> > >  {
> > >         switch (lr & 0xf) {
> > >         case 0:
> > > diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
> > > index 17bceef83269e..bf37fd3198ba7 100644
> > > --- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
> > > +++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
> > > @@ -70,13 +70,14 @@ static int lr_map_idx_to_shadow_idx(struct shadow_if *shadow_if, int idx)
> > >   * - on L2 put: perform the inverse transformation, so that the result of L2
> > >   *   running becomes visible to L1 in the VNCR-accessible registers.
> > >   *
> > > - * - there is nothing to do on L2 entry, as everything will have happened
> > > - *   on load. However, this is the point where we detect that an interrupt
> > > - *   targeting L1 and prepare the grand switcheroo.
> > > + * - there is nothing to do on L2 entry apart from enabling the vgic, as
> > > + *   everything will have happened on load. However, this is the point where
> > > + *   we detect that an interrupt targeting L1 and prepare the grand
> > > + *   switcheroo.
> > >   *
> > > - * - on L2 exit: emulate the HW bit, and deactivate corresponding the L1
> > > - *   interrupt. The L0 active state will be cleared by the HW if the L1
> > > - *   interrupt was itself backed by a HW interrupt.
> > > + * - on L2 exit: resync the LRs and VMCR, emulate the HW bit, and deactivate
> > > + *   corresponding the L1 interrupt. The L0 active state will be cleared by
> > > + *   the HW if the L1 interrupt was itself backed by a HW interrupt.
> > >   *
> > >   * Maintenance Interrupt (MI) management:
> > >   *
> > > @@ -265,15 +266,30 @@ static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu,
> > >         s_cpu_if->used_lrs = hweight16(shadow_if->lr_map);
> > >  }
> > >
> > > +void vgic_v3_flush_nested(struct kvm_vcpu *vcpu)
> > > +{
> > > +       u64 val = __vcpu_sys_reg(vcpu, ICH_HCR_EL2);
> > > +
> > > +       write_sysreg_s(val | vgic_ich_hcr_trap_bits(), SYS_ICH_HCR_EL2);
> > > +}
> > > +
> > >  void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
> > >  {
> > >         struct shadow_if *shadow_if = get_shadow_if();
> > >         int i;
> > >
> > >         for_each_set_bit(i, &shadow_if->lr_map, kvm_vgic_global_state.nr_lr) {
> > > -               u64 lr = __vcpu_sys_reg(vcpu, ICH_LRN(i));
> > > +               u64 val, host_lr, lr;
> > >                 struct vgic_irq *irq;
> > >
> > > +               host_lr = __gic_v3_get_lr(lr_map_idx_to_shadow_idx(shadow_if, i));
> > > +
> > > +               /* Propagate the new LR state */
> > > +               lr = __vcpu_sys_reg(vcpu, ICH_LRN(i));
> > > +               val = lr & ~ICH_LR_STATE;
> > > +               val |= host_lr & ICH_LR_STATE;
> > > +               __vcpu_assign_sys_reg(vcpu, ICH_LRN(i), val);
> > > +
> >
> > As I said before, I am outside of my comfort zone here. However,
> > should the following check be changed to use the merged 'val', rather
> > than the guest lr as it was?
>
> [...]
>
> >
> > >                 if (!(lr & ICH_LR_HW) || !(lr & ICH_LR_STATE))
> > >                         continue;
>
> No, this decision must be taken based on the *original* state, before
> the L2 guest was run. If the LR was in an invalid state the first
> place, there is nothing to do.
>
> > >
> > > @@ -286,12 +302,21 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
> > >                 if (WARN_ON(!irq)) /* Shouldn't happen as we check on load */
> > >                         continue;
> > >
> > > -               lr = __gic_v3_get_lr(lr_map_idx_to_shadow_idx(shadow_if, i));
> > > -               if (!(lr & ICH_LR_STATE))
> > > +               if (!(host_lr & ICH_LR_STATE))
> > >                         irq->active = false;
>
> And here, if we see that the *new* state (as fished out of the HW LRs)
> is now invalid, this means that a deactivation has taken place in L2,
> and we must propagate it to L1.

Thanks for the clarification.

Reviewed-by: Fuad Tabba <tabba at google.com>

Cheers,
/fuad

> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list