[PATCH] KVM: arm64: nv: Translate vEL2 PSTATE to EL1 in kvm_hyp_handle_mops()

swing sze bestswngs at gmail.com
Tue Jun 16 19:52:43 PDT 2026


Oliver Upton <oupton at kernel.org> 于2026年6月17日周三 04:14写道:
>
> Hi Weiming,
>
> Thanks for the fix.
>
> On Tue, Jun 16, 2026 at 07:49:44PM +0800, Weiming Shi wrote:
> > When a nested virtualisation guest is running its virtual EL2 (vEL2),
> > fixup_guest_exit() rewrites vcpu_cpsr() to the guest's virtual exception
> > level: a hardware PSTATE.M of EL1{t,h} is presented as EL2{t,h}. The
> > hardware, however, executes vEL2 at EL1.
> >
> > kvm_hyp_handle_mops() runs on the fast guest re-entry path, where it
> > clears the single-step bit and restores SPSR_EL2 directly from
> > vcpu_cpsr():
> >
> >       *vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
> >       write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
> >
> > For a guest hypervisor this writes the vEL2 view (PSTATE.M == EL2h) into
> > the hardware SPSR_EL2 without translating it back. The fast path re-enters
> > the guest via __guest_enter()/ERET without going through
> > __sysreg_restore_el2_return_state(), so neither to_hw_pstate() nor the
> > "return to a less privileged mode" safety check there (which would set
> > PSR_IL_BIT) is applied. The ERET therefore restores PSTATE.M = EL2h and
> > re-enters the guest at the real EL2 with a guest-controlled ELR, escaping
> > stage-2 and the guest/host boundary.
> >
> > This is reachable on a kernel with FEAT_MOPS running a KVM nested guest
> > (kvm-arm.mode=nested): KVM sets HCRX_EL2.MCE2, which the guest hypervisor
> > cannot clear for its own context (is_nested_ctxt() is false), so a vEL2
> > MOPS exception is taken to the host and dispatched to kvm_hyp_handle_mops()
> > with VCPU_IN_HYP_CONTEXT set.
> >
> > Translate EL2{t,h} back to EL1{t,h} before writing SPSR_EL2, mirroring
> > kvm_hyp_handle_eret(). For non-nested guests vcpu_cpsr() never holds an
> > EL2 mode, so the translation is a no-op and behaviour is unchanged.
>
> The changelog is unnecessarily verbose, instead:
>
>   kvm_hyp_handle_mops() resets the single-step state machine as part of
>   rewinding state for a MOPS exception by modifying vcpu_cpsr() and
>   writing the result directly into hardware.
>
>   In the case of nested virtualization, vcpu_cpsr() is a synthetic value
>   such that the rest of KVM can deal with vEL2 cleanly. That means the
>   value requires translation before being written into hardware, which is
>   unfortunately missing from the MOPS handler.
>
>   Fix it by directly modifying SPSR_EL2 and avoiding the synthetic state
>   altogether, which will be resynchronized on the next 'full' exit back
>   to KVM.
>
> Also:
>
> Cc: stable at vger.kernel.org
>
> Definitely meets the bar :)
>
> > Fixes: 2de451a329cf ("KVM: arm64: Add handler for MOPS exceptions")
> > Assisted-by: Claude:claude-opus-4-8
> > Reported-by: Zhong Wang <wangzhong.c0ss4ck at bytedance.com>
> > Reported-by: Xuanqing Shi <shixuanqing.11 at bytedance.com>
> > Signed-off-by: Weiming Shi <bestswngs at gmail.com>
> > ---
> >  arch/arm64/kvm/hyp/include/hyp/switch.h | 23 ++++++++++++++++++++++-
> >  1 file changed, 22 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> > index e9b36a3b27bbc..a6b7963ddbf0b 100644
> > --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> > +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> > @@ -448,6 +448,8 @@ static inline bool __populate_fault_info(struct kvm_vcpu *vcpu)
> >
> >  static inline bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code)
> >  {
> > +     u64 spsr, mode;
> > +
> >       *vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
> >       arm64_mops_reset_regs(vcpu_gp_regs(vcpu), vcpu->arch.fault.esr_el2);
> >       write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
> > @@ -457,7 +459,26 @@ static inline bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code)
> >        * instruction.
> >        */
> >       *vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
> > -     write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR);
> > +
> > +     /*
> > +      * For a guest hypervisor, vcpu_cpsr() holds the vEL2 view
> > +      * (PSTATE.M == EL2h) installed by fixup_guest_exit(), but vEL2
> > +      * runs at EL1. Translate it back before restoring SPSR_EL2, as in
> > +      * kvm_hyp_handle_eret().
> > +      */
> > +     spsr = *vcpu_cpsr(vcpu);
> > +     mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
> > +     switch (mode) {
> > +     case PSR_MODE_EL2t:
> > +             mode = PSR_MODE_EL1t;
> > +             break;
> > +     case PSR_MODE_EL2h:
> > +             mode = PSR_MODE_EL1h;
> > +             break;
> > +     }
> > +     spsr = (spsr & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
> > +
> > +     write_sysreg_el2(spsr, SYS_SPSR);
>
> As I allude to in the modified changelog, I'd rather we just manipulate
> the hardware value of SPSR_EL2 directly. We already do this in
> kvm_hyp_handle_eret()
>
>         spsr = read_sysreg_el2(SYS_SPSR);
>         write_sysreg_el2(spsr & ~DBG_SPSR_SS, SYS_SPSR);
>
> Thanks,
> Oliver

Hi Oliver,

Thanks for your review. I will send the v2 version later.

Best
Weiming Shi



More information about the linux-arm-kernel mailing list