[PATCHv2] kvm: arm64: Add SVE support for nVHE.
Dave Martin
Dave.Martin at arm.com
Mon Feb 8 09:46:31 EST 2021
On Fri, Feb 05, 2021 at 12:12:51AM +0000, Daniel Kiss wrote:
>
>
> > On 4 Feb 2021, at 18:36, Dave Martin <Dave.Martin at arm.com> wrote:
> >
> > On Tue, Feb 02, 2021 at 07:52:54PM +0100, Daniel Kiss wrote:
> >> CPUs that support SVE are architecturally required to support the
> >> Virtualization Host Extensions (VHE), so far the kernel supported
> >> SVE alongside KVM with VHE enabled. In same cases it is desired to
> >> run nVHE config even when VHE is available.
> >> This patch add support for SVE for nVHE configuration too.
> >>
> >> Tested on FVP with a Linux guest VM that run with a different VL than
> >> the host system.
> >>
> >> Signed-off-by: Daniel Kiss <daniel.kiss at arm.com>
[...]
> >> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> >> index 3e081d556e81..8f29b468e989 100644
> >> --- a/arch/arm64/kvm/fpsimd.c
> >> +++ b/arch/arm64/kvm/fpsimd.c
> >> @@ -42,6 +42,16 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
> >> if (ret)
> >> goto error;
> >>
> >> + if (!has_vhe() && vcpu->arch.sve_state) {
> >> + void *sve_state_end = vcpu->arch.sve_state +
> >> + SVE_SIG_REGS_SIZE(
> >> + sve_vq_from_vl(vcpu->arch.sve_max_vl));
> >> + ret = create_hyp_mappings(vcpu->arch.sve_state,
> >> + sve_state_end,
> >> + PAGE_HYP);
> >> + if (ret)
> >> + goto error;
> >> + }
> >> vcpu->arch.host_thread_info = kern_hyp_va(ti);
> >> vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
> >> error:
> >> @@ -109,10 +119,22 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
> >> local_irq_save(flags);
> >>
> >> if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
> >> + if (guest_has_sve) {
> >> + if (has_vhe())
> >> + __vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_s(SYS_ZCR_EL12);
> >> + else {
> >> + __vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_s(SYS_ZCR_EL1);
> >> + /*
> >> + * vcpu could set ZCR_EL1 to a shorter VL then the max VL but
> >> + * the context is still valid there. Save the whole context.
> >> + * In nVHE case we need to reset the ZCR_EL1 for that
> >> + * because the save will be done in EL1.
> >> + */
> >> + write_sysreg_s(sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1,
> >> + SYS_ZCR_EL1);
> >
> > This still doesn't feel right. We're already in EL1 here I think, in
> > which case ZCR_EL1 has an immediate effect on what state the
> > architecture guarantees to preserve: if we need to change ZCR_EL1, it's
> > because it might be wrong. If it's wrong, it might be too small. And
> > if it's too small, we may have already lost some SVE register bits that
> > the guest cares about.
> "On taking an exception from an Exception level that is more constrained
> to a target Exception level that is less constrained, or on writing a larger value
> to ZCR_ELx.LEN, then the previously inaccessible bits of these registers that
> become accessible have a value of either zero or the value they had before
> executing at the more constrained size.”
> If the CPU zeros the register when ZCR is written or exception is taken my reading
> of the above is that the register content maybe lost when we land in EL2.
> No code shall not count on the register content after reduces the VL in ZCR.
>
> I see my comment also not clear enough.
> Maybe we shouldn’t save the guest’s sve_max_vl here, would enough to save up to
> the actual VL.
Maybe you're right, but I may be missing some information here.
Can you sketch out more explicitly how it works, showing how all the
bits the host cares about (and only those bits) are saved/restored for
the host, and how all the bits the guest cares about (and only those
bits) are saved/restored for the guest?
Various optimisations are possible, but there is a risk of breaking
assumptions elsewhere. For example, the KVM_{SET,GET}_ONE_REG code
makes assmuptions about the layout of the data in
vcpu->arch.sve_state.
The architectural rules about when SVE register bits are also complex,
with many interactions. We also don't want to aggressively optimise in
a way that might be hard to apply to nested virt.
My instinct is to keep it simple while this patch matures, and continue
to save/restore based on vcpu->arch.sve_max_vl. This keeps a clear
split between the emulated "hardware" (which doesn't change while the
guest runs), and changeable runtime state (like the guest's ZCR_EL1).
I'm happy to review proposed optimisations, but I think those should be
separated out as later patches (or a separate series). My experience
is that it's much easier to get this wrong than to get it right!
When it's wrong, it can be a nightmare to debug.
> > I think that we need to handle this on our way out of hyp instead,
> > _before_ returning back to EL1.
> >
> > When at EL2 exiting back to the host: if and only if
> > KVM_ARM64_FP_ENABLED is set then save the guest's ZCR_EL1 and ZCR_EL1 to
> > match the guest's sve_max_vl (possibly by just cloning ZCR_EL2).
> >
> >> + }
> >> + }
> >> fpsimd_save_and_flush_cpu_state();
> >> -
> >> - if (guest_has_sve)
> >> - __vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_s(SYS_ZCR_EL12);
> >> } else if (host_has_sve) {
> >> /*
> >> * The FPSIMD/SVE state in the CPU has not been touched, and we
[...]
> >> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> >> index f3d0e9eca56c..df9e912d1278 100644
> >> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> >> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> >> @@ -45,6 +45,18 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
> >> if (!update_fp_enabled(vcpu)) {
> >> val |= CPTR_EL2_TFP;
> >> __activate_traps_fpsimd32(vcpu);
> >> + } else {
> >> + if (vcpu_has_sve(vcpu)) {
> >> + /*
> >> + * The register access will not be trapped so restore
> >> + * ZCR_EL1/ZCR_EL2 because those were set for the host.
> >> + */
> >> + write_sysreg_s(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR_EL1);
> >> + write_sysreg_s(
> >> + sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1,
> >> + SYS_ZCR_EL2);
> >> + val &= ~CPTR_EL2_TZ;
> >> + }
> >> }
> >>
> >> write_sysreg(val, cptr_el2);
> >> @@ -110,6 +122,17 @@ static void __load_host_stage2(void)
> >> write_sysreg(0, vttbr_el2);
> >> }
> >>
> >> +static void __restore_host_sve_state(struct kvm_vcpu *vcpu)
> >> +{
> >> + /*
> >> + * If the guest uses SVE, the ZCR_EL2 was configured for the guest.
> >> + * Host might save the context in EL1 but for that the ZCR_EL2 need
> >> + * to be reset to the host's default.
> >
> > This isn't just about saving the guest context correctly. The host have
> > be using larger vectors than the guest's sve_max_vl allows.
> Right.
Can you clarify the comment please (unless you've done so already)?
> >
> >> + */
> >> + if (vcpu_has_sve(vcpu) && (vcpu->arch.flags |= KVM_ARM64_FP_ENABLED))
> >> + write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL2);
> >
> > I'm not sure if it's worth having a special #define for this, but it
> > would be a good idea at least to put comments here and in el2_setup.h to
> > remind people that the ZCR_EL2 settings need to match. Otherwise this
> > might get mis-maintained in the future.
> I will add a comment to el2_setup.h
Can you put comments in both places please? Maintainers of either bit
of code need to be aware of the other code.
[...]
> >> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> >> index 47f3f035f3ea..17cc5e87adcd 100644
> >> --- a/arch/arm64/kvm/reset.c
> >> +++ b/arch/arm64/kvm/reset.c
> >> @@ -74,10 +74,6 @@ static int kvm_vcpu_enable_sve(struct kvm_vcpu *vcpu)
> >> if (!system_supports_sve())
> >> return -EINVAL;
> >>
> >> - /* Verify that KVM startup enforced this when SVE was detected: */
> >> - if (WARN_ON(!has_vhe()))
> >> - return -EINVAL;
> >> -
> >> vcpu->arch.sve_max_vl = kvm_sve_max_vl;
> >>
> >> /*
> >> @@ -113,7 +109,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
> >> buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
> >> if (!buf)
> >> return -ENOMEM;
> >> -
> >> + __vcpu_sys_reg(vcpu, ZCR_EL1) = sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1;
> >
> > What's this for?
> >
> > __vcpu_sys_reg(vcpu, ZCR_EL1) should already be being reset sensibly
> > somewhere. If not, that would be a bug…
> It is not needed indeed.
Ah, OK.
Cheers
---Dave
More information about the linux-arm-kernel
mailing list