[PATCH] kvm: arm64: Add SVE support for nVHE.

Tue Feb 2 13:44:43 EST 2021

Thanks for the comments, I’m going to send v2 soon.

Thanks
Daniel

> On 26 Jan 2021, at 16:49, Dave Martin <Dave.Martin at arm.com> wrote:
> 
> On Fri, Jan 22, 2021 at 06:21:21PM +0000, Marc Zyngier wrote:
>> Daniel,
>> 
>> Please consider cc'ing the maintainer (me) as well as the KVM/arm64
>> reviewers (Julien, James, Suzuki) and the kvmarm list (all now Cc'd).
>> 
>> On 2021-01-22 01:07, Daniel Kiss wrote:
>>> CPUs that support SVE are architecturally required to support the
>>> Virtualization Host Extensions (VHE), so far the kernel supported
>>> SVE alongside KVM with VHE enabled. In same cases it is desired to
>>> run nVHE config even when VHE is available.
>>> This patch add support for SVE for nVHE configuration too.
>>> 
>>> In case of nVHE the system registers behave a bit differently.
>>> ZCR_EL2 defines the maximum vector length that could be set in ZCR_EL1
>>> effectively. To limit the vector length for the guest the ZCR_EL2 need
>>> to be set accordingly therefore it become part of the context.
>> 
>> Not really. It's just part of the *hypervisor* state for this guest,
>> and not part of the guest state. Not different from HCR_EL2, for example.
> 
> Also, ZCR_EL2 doesn't affect what can be written in ZCR_EL1, so this
> might be reworded to say that it just limits the effective vector length
> available to the guest.
Okay, found a way to switch the ZCR_EL2 without adding to the context.

>>> The sve_state will be loaded in EL2 so it need to be mapped and during
>>> the load ZCR_EL2 will control the vector length.
>>> Trap control is similar to the VHE case except the bit values are the
>>> opposite. ZCR_EL1 access trapping with VHE is ZEN value 0 but in case of
>>> nVHE the TZ need to be set 1 to trigger the exception. Trap control need
>>> to be respected during the context switch even in EL2.
>> 
>> Isn't that exactly the same as FPSIMD accesses?
> 
> (Yes, this isn't really new.  It might be best to let the code speak for
> itself on that point, rather than trying to explain it in the commit
> message.)
> 
>> 
>>> 
>>> Tested on FVP with a Linux guest VM that run with a different VL than
>>> the host system.
>>> 
>>> This patch requires sve_set_vq from
>>> - arm64/sve: Rework SVE trap access to minimise memory access
>> 
>> Care to add a pointer to this patch? This also shouldn't be part
>> of the commit message. When you repost it, please include the
>> other patch as a series unless it has already been merged by then.
Thanks, I will do like this next time.
By the new changes seems it is not even needed.

>>> 
>>> Signed-off-by: Daniel Kiss <daniel.kiss at arm.com>
>>> ---
>>> arch/arm64/Kconfig                         |  7 ----
>>> arch/arm64/include/asm/fpsimd.h            |  4 ++
>>> arch/arm64/include/asm/fpsimdmacros.h      | 38 +++++++++++++++++
>>> arch/arm64/include/asm/kvm_host.h          | 19 +++------
>>> arch/arm64/kernel/fpsimd.c                 | 11 +++++
>>> arch/arm64/kvm/arm.c                       |  5 ---
>>> arch/arm64/kvm/fpsimd.c                    | 22 +++++++---
>>> arch/arm64/kvm/hyp/fpsimd.S                | 15 +++++++
>>> arch/arm64/kvm/hyp/include/hyp/switch.h    | 48 ++++++++++++----------
>>> arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 26 ++++++++++++
>>> arch/arm64/kvm/hyp/nvhe/switch.c           |  6 ++-
>>> arch/arm64/kvm/reset.c                     |  8 ++--
>>> 12 files changed, 153 insertions(+), 56 deletions(-)
>>> 
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index a6b5b7ef40ae..f17ab095e99f 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -1692,7 +1692,6 @@ endmenu
>>> config ARM64_SVE
>>> 	bool "ARM Scalable Vector Extension support"
>>> 	default y
>>> -	depends on !KVM || ARM64_VHE
>>> 	help
>>> 	  The Scalable Vector Extension (SVE) is an extension to the AArch64
>>> 	  execution state which complements and extends the SIMD functionality
>>> @@ -1721,12 +1720,6 @@ config ARM64_SVE
>>> 	  booting the kernel.  If unsure and you are not observing these
>>> 	  symptoms, you should assume that it is safe to say Y.
>>> 
>>> -	  CPUs that support SVE are architecturally required to support the
>>> -	  Virtualization Host Extensions (VHE), so the kernel makes no
>>> -	  provision for supporting SVE alongside KVM without VHE enabled.
>>> -	  Thus, you will need to enable CONFIG_ARM64_VHE if you want to support
>>> -	  KVM in the same kernel image.
>>> -
>>> config ARM64_MODULE_PLTS
>>> 	bool "Use PLTs to allow module memory to spill over into vmalloc area"
>>> 	depends on MODULES
>>> diff --git a/arch/arm64/include/asm/fpsimd.h
>>> b/arch/arm64/include/asm/fpsimd.h
>>> index e60aa4ebb351..e7889f4c5cef 100644
>>> --- a/arch/arm64/include/asm/fpsimd.h
>>> +++ b/arch/arm64/include/asm/fpsimd.h
>>> @@ -69,6 +69,10 @@ static inline void *sve_pffr(struct thread_struct
>>> *thread)
>>> extern void sve_save_state(void *state, u32 *pfpsr);
>>> extern void sve_load_state(void const *state, u32 const *pfpsr,
>>> 			   unsigned long vq_minus_1);
>>> +extern void sve_save_state_nvhe(void *state, u32 *pfpsr,
>>> +			   unsigned long vq_minus_1);
>>> +extern void sve_load_state_nvhe(void const *state, u32 const *pfpsr,
>>> +			   unsigned long vq_minus_1);
>> 
>> Why do we need two different entry points?
Those will use ZCR_EL2 to control the vector length instead of ZCR_EL1.
I’ll add some comments. 

>>> extern void sve_flush_live(void);
>>> extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const
>>> *state,
>>> 				       unsigned long vq_minus_1);
>>> diff --git a/arch/arm64/include/asm/fpsimdmacros.h
>>> b/arch/arm64/include/asm/fpsimdmacros.h
>>> index af43367534c7..3a18172ee4f6 100644
>>> --- a/arch/arm64/include/asm/fpsimdmacros.h
>>> +++ b/arch/arm64/include/asm/fpsimdmacros.h
>>> @@ -205,6 +205,17 @@
>>> 921:
>>> .endm
>>> 
>>> +/* Update ZCR_EL2.LEN with the new VQ */
>>> +.macro sve_load_vq_nvhe xvqminus1, xtmp, xtmp2
>>> +		mrs_s		\xtmp, SYS_ZCR_EL2
>>> +		bic		\xtmp2, \xtmp, ZCR_ELx_LEN_MASK
>>> +		orr		\xtmp2, \xtmp2, \xvqminus1
>>> +		cmp		\xtmp2, \xtmp
>>> +		b.eq		922f
>>> +		msr_s		SYS_ZCR_EL2, \xtmp2	//self-synchronising
>>> +922:
>>> +.endm
>>> +
>>> /* Preserve the first 128-bits of Znz and zero the rest. */
>>> .macro _sve_flush_z nz
>>> 	_sve_check_zreg \nz
>>> @@ -230,6 +241,20 @@
>>> 		str		w\nxtmp, [\xpfpsr, #4]
>>> .endm
>>> 
>>> +.macro sve_save_nvhe nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2
>>> +		sve_load_vq_nvhe	\xvqminus1, x\nxtmp, \xtmp2
>>> + _for n, 0, 31,	_sve_str_v	\n, \nxbase, \n - 34
>>> + _for n, 0, 15,	_sve_str_p	\n, \nxbase, \n - 16
>>> +		_sve_rdffr	0
>>> +		_sve_str_p	0, \nxbase
>>> +		_sve_ldr_p	0, \nxbase, -16
>>> +
>>> +		mrs		x\nxtmp, fpsr
>>> +		str		w\nxtmp, [\xpfpsr]
>>> +		mrs		x\nxtmp, fpcr
>>> +		str		w\nxtmp, [\xpfpsr, #4]
>>> +.endm
>> 
>> Please document what this macro does.
ACK.
>> 
>>> +
>>> .macro sve_load nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2
>>> 		sve_load_vq	\xvqminus1, x\nxtmp, \xtmp2
>>> _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
>>> @@ -242,3 +267,16 @@
>>> 		ldr		w\nxtmp, [\xpfpsr, #4]
>>> 		msr		fpcr, x\nxtmp
>>> .endm
>>> +
>>> +.macro sve_load_nvhe nxbase, xpfpsr, xvqminus1, nxtmp, xtmp2
>>> +		sve_load_vq_nvhe	\xvqminus1, x\nxtmp, \xtmp2
>>> + _for n, 0, 31,	_sve_ldr_v	\n, \nxbase, \n - 34
>>> +		_sve_ldr_p	0, \nxbase
>>> +		_sve_wrffr	0
>>> + _for n, 0, 15,	_sve_ldr_p	\n, \nxbase, \n - 16
>>> +
>>> +		ldr		w\nxtmp, [\xpfpsr]
>>> +		msr		fpsr, x\nxtmp
>>> +		ldr		w\nxtmp, [\xpfpsr, #4]
>>> +		msr		fpcr, x\nxtmp
>>> +.endm
> 
> These macros are virtually the same as some existing macros.  We don't
> want to maintain this kind of thing in two places if we can avoid it --
> can you try to keep them unified please?
Done.
> 
>>> diff --git a/arch/arm64/include/asm/kvm_host.h
>>> b/arch/arm64/include/asm/kvm_host.h
>>> index 0cd9f0f75c13..8bfac6f52858 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -137,6 +137,7 @@ enum vcpu_sysreg {
>>> 	ACTLR_EL1,	/* Auxiliary Control Register */
>>> 	CPACR_EL1,	/* Coprocessor Access Control */
>>> 	ZCR_EL1,	/* SVE Control */
>>> +	ZCR_EL2,	/* SVE Control */
>> 
>> No. This obviously isn't a guest register, and it has no purpose
>> in this array.
> 
> Ack, the value in ZCR_EL2 should just be set based on
> vcpu->arch.sve_max_vl, which is locked after kvm_arm_vcpu_finalize().
> The guest can't change it, so we don't need to re-save this anywhere
> when exiting back to the host: we just need to restore the host value.
> 
> We do need somewhere to save the host value for this register while the
> guest is running, though.  The value is determined by
> head.S:init_el2_nvhe() (in the __init_el2_nvhe_sve macro), and isn't
> stored anywhere other than in ZCR_EL2 itself.  I feel it's better just
> to save and restore that, rather than duplicating the logic that
> determines what it should be for the host -- though obvioulsy it would
> work either way.
> 
>> 
>>> 	TTBR0_EL1,	/* Translation Table Base Register 0 */
>>> 	TTBR1_EL1,	/* Translation Table Base Register 1 */
>>> 	TCR_EL1,	/* Translation Control Register */
>>> @@ -283,6 +284,7 @@ struct vcpu_reset_state {
>>> struct kvm_vcpu_arch {
>>> 	struct kvm_cpu_context ctxt;
>>> 	void *sve_state;
>>> +	void *sve_state_hyp;
> 
> [...]
> 
>>> 	unsigned int sve_max_vl;
>>> 
>>> 	/* Stage 2 paging state used by the hardware on next switch */
>>> @@ -386,6 +388,10 @@ struct kvm_vcpu_arch {
>>> #define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) +
>>> \
>>> 				      sve_ffr_offset((vcpu)->arch.sve_max_vl)))
>>> 
>>> +#define vcpu_sve_pffr_hyp(vcpu) \
>>> +			((void *)((char *)((vcpu)->arch.sve_state_hyp) + \
>>> +			sve_ffr_offset((vcpu)->arch.sve_max_vl)))
>> 
>> Why can't you have a single macro that embeds kern_hyp_va()?
Done.

>> 
>>> +
>>> #define vcpu_sve_state_size(vcpu) ({					\
>>> 	size_t __size_ret;						\
>>> 	unsigned int __vcpu_vq;						\
>>> @@ -579,19 +585,6 @@ static inline void
>>> kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt)
>>> 	ctxt_sys_reg(cpu_ctxt, MPIDR_EL1) = read_cpuid_mpidr();
>>> }
>>> 
>>> -static inline bool kvm_arch_requires_vhe(void)
>>> -{
>>> -	/*
>>> -	 * The Arm architecture specifies that implementation of SVE
>>> -	 * requires VHE also to be implemented.  The KVM code for arm64
>>> -	 * relies on this when SVE is present:
>>> -	 */
>>> -	if (system_supports_sve())
>>> -		return true;
>>> -
>>> -	return false;
>>> -}
>>> -
>>> void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu);
>>> 
>>> static inline void kvm_arch_hardware_unsetup(void) {}
>>> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
>>> index 062b21f30f94..1f94c5ca4943 100644
>>> --- a/arch/arm64/kernel/fpsimd.c
>>> +++ b/arch/arm64/kernel/fpsimd.c
>>> @@ -308,6 +308,17 @@ static void fpsimd_save(void)
>>> 
>>> 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
>>> 		if (system_supports_sve() && test_thread_flag(TIF_SVE)) {
>>> +
>>> +			if (!has_vhe() && (sve_get_vl() != last->sve_vl)) {
>>> +				/* last->sve_state could belong to a vcpu, nVHE
>>> +				   case the context is loaded in EL2 where ZCR_EL2
>>> +				   is used. In this context the ZCR_EL1 is used but
>>> +				   it could be different because it was not
>>> +				   set when the vcpu context is loaded. Let's set
>>> +				   it to the guest's VL.
>>> +				 */
>> 
>> Please use the kernel comment style.
>> 
>>> +				sve_set_vq(sve_vq_from_vl(last->sve_vl) - 1);
>>> +			}
> 
> On the surface of it, this looks like it's in the wrong place -- there
> shouldn't really be any need for KVM-specific hacks in the core context
> switch code.
> 
> Thinking about this, don't we always save out any guest state in
> kvm_arch_vcpu_put_fp(), no?  That code calls
> fpsimd_save_and_flush_cpu_state(), so TIF_FOREIGN_FPSTATE should always
> be set when reenabling preemption and exiting the KVM run loop.  The
> core context switch code should never see dangling guest state.
> 
> There's potential for optimisation here -- we could avoid flushing the
> guest state, but just save it.  This would allow the state to be reused
> if the next user of FPSIMD/SVE on this cpu is the KVM run loop.  I
> think we figured that the extra complexity wasn't worth it, since
> exiting the guest is already costly and KVM does its best to avoid it
> -- the additional cost of reloading the guest vector regs may therefore
> not be worth caring about.
> 
> (Disclaimer -- my understanding of the code may be a bit out of date, so
> shout if the above sounds suspicious!)
> 
> 
> So in conclusion, actually I'm wondering whether we need to do anything
> at all here.  Which is just as well, since fudging ZCR.LEN here would be
> broken anyway: if it was already too small, the extra bits of the SVE
> vector registers could have been thrown away by the hardware already.

last->sve_vl will be set to the VL for the vcpu. in case of VHE it is fine because
the save runs with ZCR_EL2 which holds the maximum VL. Value in the ZCR_EL1
does not matter.
nVHE case the save_state will use ZCR_EL1 that is managed by the guest.
The guest is allowed to set it to other values and does if /proc/sys/abi/sve_default_vector_length 
is set.
This can be managed in the kvm part, where ZCR_EL1 is saved.

> 
>>> 			if (WARN_ON(sve_get_vl() != last->sve_vl)) {
>>> 				/*
>>> 				 * Can't save the user regs, so current would
>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>> index c0ffb019ca8b..3d37f4ad2228 100644
>>> --- a/arch/arm64/kvm/arm.c
>>> +++ b/arch/arm64/kvm/arm.c
>>> @@ -1749,11 +1749,6 @@ int kvm_arch_init(void *opaque)
>>> 
>>> 	in_hyp_mode = is_kernel_in_hyp_mode();
>>> 
>>> -	if (!in_hyp_mode && kvm_arch_requires_vhe()) {
>>> -		kvm_pr_unimpl("CPU unsupported in non-VHE mode, not initializing\n");
>>> -		return -ENODEV;
>>> -	}
>>> -
>>> 	if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) ||
>>> 	    cpus_have_final_cap(ARM64_WORKAROUND_1508412))
>>> 		kvm_info("Guests without required CPU erratum workarounds can
>>> deadlock system!\n" \
>>> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
>>> index 3e081d556e81..cfbc869caeeb 100644
>>> --- a/arch/arm64/kvm/fpsimd.c
>>> +++ b/arch/arm64/kvm/fpsimd.c
>>> @@ -42,6 +42,15 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
>>> 	if (ret)
>>> 		goto error;
>>> 
>>> +	if (!has_vhe() && vcpu->arch.sve_state) {
>>> +		ret = create_hyp_mappings(vcpu->arch.sve_state,
>>> +					vcpu->arch.sve_state +
>>> +					SVE_SIG_REGS_SIZE(sve_vq_from_vl(vcpu->arch.sve_max_vl)),
>>> +					PAGE_HYP);
>>> +		if (ret)
>>> +			goto error;
>>> +		vcpu->arch.sve_state_hyp = kern_hyp_va(vcpu->arch.sve_state);
>> 
>> How useful is it to keep that duplicated pointer around, given that
>> we can compute it on the fly at a very low cost?
>> 
>>> +	}
>>> 	vcpu->arch.host_thread_info = kern_hyp_va(ti);
>>> 	vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
>>> error:
>>> @@ -111,8 +120,9 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
>>> 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
>>> 		fpsimd_save_and_flush_cpu_state();
>>> 
>>> -		if (guest_has_sve)
>>> +		if (guest_has_sve && has_vhe())
>>> 			__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_s(SYS_ZCR_EL12);
>>> +
> 
> Spurious blank line?
Done
> 
>>> 	} else if (host_has_sve) {
>>> 		/*
>>> 		 * The FPSIMD/SVE state in the CPU has not been touched, and we
>>> @@ -121,10 +131,12 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
>>> 		 * for EL0.  To avoid spurious traps, restore the trap state
>>> 		 * seen by kvm_arch_vcpu_load_fp():
>>> 		 */
> 
> With nVHE, the HYP exit code already restored CPACR_EL1, right?
> 
> Might be worth adding a sentence explaining why skipping this code for
> nVHE is OK.
Done
> 
>>> -		if (vcpu->arch.flags & KVM_ARM64_HOST_SVE_ENABLED)
>>> -			sysreg_clear_set(CPACR_EL1, 0, CPACR_EL1_ZEN_EL0EN);
>>> -		else
>>> -			sysreg_clear_set(CPACR_EL1, CPACR_EL1_ZEN_EL0EN, 0);
>>> +		if (has_vhe()) {
>>> +			if (vcpu->arch.flags & KVM_ARM64_HOST_SVE_ENABLED)
>>> +				sysreg_clear_set(CPACR_EL1, 0, CPACR_EL1_ZEN_EL0EN);
>>> +			else
>>> +				sysreg_clear_set(CPACR_EL1, CPACR_EL1_ZEN_EL0EN, 0);
>>> +		}
>>> 	}
>>> 
>>> 	update_thread_flag(TIF_SVE,
>>> diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
>>> index 01f114aa47b0..15d47ebdd30a 100644
>>> --- a/arch/arm64/kvm/hyp/fpsimd.S
>>> +++ b/arch/arm64/kvm/hyp/fpsimd.S
>>> @@ -6,6 +6,7 @@
>>> 
>>> #include <linux/linkage.h>
>>> 
>>> +#include <asm/assembler.h>
>>> #include <asm/fpsimdmacros.h>
>>> 
>>> 	.text
>>> @@ -19,3 +20,17 @@ SYM_FUNC_START(__fpsimd_restore_state)
>>> 	fpsimd_restore	x0, 1
>>> 	ret
>>> SYM_FUNC_END(__fpsimd_restore_state)
>>> +
>>> +#ifdef CONFIG_ARM64_SVE
>>> +
>>> +SYM_FUNC_START(sve_save_state_nvhe)
>>> +	sve_save_nvhe 0, x1, x2, 3, x4
>>> +	ret
>>> +SYM_FUNC_END(sve_save_state_nvhe)
>>> +
>>> +SYM_FUNC_START(sve_load_state_nvhe)
>>> +	sve_load_nvhe 0, x1, x2, 3, x4
>>> +	ret
>>> +SYM_FUNC_END(sve_load_state_nvhe)
>>> +
>>> +#endif
>>> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h
>>> b/arch/arm64/kvm/hyp/include/hyp/switch.h
>>> index 1f875a8f20c4..a14b0d92d9ca 100644
>>> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
>>> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
>>> @@ -197,25 +197,18 @@ static inline bool __populate_fault_info(struct
>>> kvm_vcpu *vcpu)
>>> /* Check for an FPSIMD/SVE trap and handle as appropriate */
>>> static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
>>> {
>>> -	bool vhe, sve_guest, sve_host;
>>> +	bool sve_guest, sve_host;
>>> 	u8 esr_ec;
>>> 
>>> 	if (!system_supports_fpsimd())
>>> 		return false;
>>> 
>>> -	/*
>>> -	 * Currently system_supports_sve() currently implies has_vhe(),
>>> -	 * so the check is redundant. However, has_vhe() can be determined
>>> -	 * statically and helps the compiler remove dead code.
>>> -	 */
>>> -	if (has_vhe() && system_supports_sve()) {
>>> +	if (system_supports_sve()) {
>>> 		sve_guest = vcpu_has_sve(vcpu);
>>> 		sve_host = vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE;
>>> -		vhe = true;
>>> 	} else {
>>> 		sve_guest = false;
>>> 		sve_host = false;
>>> -		vhe = has_vhe();
>>> 	}
>>> 
>>> 	esr_ec = kvm_vcpu_trap_get_class(vcpu);
>>> @@ -229,8 +222,7 @@ static inline bool __hyp_handle_fpsimd(struct
>>> kvm_vcpu *vcpu)
>>> 			return false;
>>> 
>>> 	/* Valid trap.  Switch the context: */
>>> -
>>> -	if (vhe) {
>>> +	if (has_vhe()) {
> 
> You might still want to cache this.
> 
> There's no way to tell the compiler that consecutive checks on the same
> static key yield the same result, so repeatedly calling has_vhe() may
> emit a small amount of redundant code, IIUC.
Done
> 
>>> 		u64 reg = read_sysreg(cpacr_el1) | CPACR_EL1_FPEN;
>>> 
>>> 		if (sve_guest)
>>> @@ -238,24 +230,30 @@ static inline bool __hyp_handle_fpsimd(struct
>>> kvm_vcpu *vcpu)
>>> 
>>> 		write_sysreg(reg, cpacr_el1);
>>> 	} else {
>>> -		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
>>> -			     cptr_el2);
>>> +		u64 reg = read_sysreg(cptr_el2) & ~CPTR_EL2_TFP ;
> 
> Spurious space before ;
> 
> Also, you changed ~(u64)CPTR_EL2_TFP to ~CPTR_EL2_TFP here.  Does it
> make a difference?
Done
> 
>>> +		if (sve_guest)
>>> +			reg &= ~CPTR_EL2_TZ;
> 
> Same here.
> 
>>> +		write_sysreg(reg, cptr_el2);
>>> 	}
>>> 
>>> 	isb();
>>> 
>>> 	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
>>> -		/*
>>> -		 * In the SVE case, VHE is assumed: it is enforced by
>>> -		 * Kconfig and kvm_arch_init().
>>> -		 */
>>> 		if (sve_host) {
>>> 			struct thread_struct *thread = container_of(
>>> 				vcpu->arch.host_fpsimd_state,
>>> 				struct thread_struct, uw.fpsimd_state);
>>> +			if (has_vhe())
>>> +				sve_save_state(sve_pffr(thread),
>>> +						&vcpu->arch.host_fpsimd_state->fpsr);
>>> +			else {
>>> +				struct kvm_cpu_context *host_ctxt =
>>> +					&this_cpu_ptr(&kvm_host_data)->host_ctxt;
>> 
>> Assignments on a single line please.
>> 
>>> +				sve_save_state_nvhe(sve_pffr(thread),
>>> +						&vcpu->arch.host_fpsimd_state->fpsr,
>>> +						ctxt_sys_reg(host_ctxt, ZCR_EL2));
>>> +			}
> 
> Does this actually need to be different from the VHE case, if we just
> leave ZCR_EL2 permanently set with the maxmium VL allowed for the guest?
> 
>>> 
>>> -			sve_save_state(sve_pffr(thread),
>>> -				       &vcpu->arch.host_fpsimd_state->fpsr);
>>> 		} else {
>>> 			__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
>>> 		}
>>> @@ -264,10 +262,18 @@ static inline bool __hyp_handle_fpsimd(struct
>>> kvm_vcpu *vcpu)
>>> 	}
>>> 
>>> 	if (sve_guest) {
>>> -		sve_load_state(vcpu_sve_pffr(vcpu),
>>> +		if (has_vhe()) {
>>> +			sve_load_state(vcpu_sve_pffr(vcpu),
>>> 			       &vcpu->arch.ctxt.fp_regs.fpsr,
>>> 			       sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1);
>>> -		write_sysreg_s(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR_EL12);
>>> +			write_sysreg_s(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR_EL12);
>>> +		} else {
>>> +			sve_load_state_nvhe(vcpu_sve_pffr_hyp(vcpu),
>>> +			       &vcpu->arch.ctxt.fp_regs.fpsr,
>>> +			       sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1);
>>> +			write_sysreg_s(__vcpu_sys_reg(vcpu, ZCR_EL2), SYS_ZCR_EL2);
>> 
>> I think you really need to explain why you restore ZCR_EL2 here, and not
>> ZCR_EL1.
> 
> Something doesn't smell quite right here.  ZCR_EL2.LEN determines how
> regs are saved/restored, so for nVHE I think it needs to change as
> follows:
> 
> * In host, after saving guest SVE regs:
> 
> 	ZCR_EL2.LEN = ZCR_ELx_LEN_MASK (from __init_el2_nvhe_sve)
> 
> * Before saving the host SVE regs here:
> 
> 	ZCR_EL2.LEN = host ZCR_EL1.LEN
> (since the size and layout of current->thread.sve_state is VL dependent)
> 
> * Before loading the guest SVE regs, and before running in the guest with
> SVE untrapped:
> 
> 	ZCR_EL2.LEN = sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1
> 
> 
> I don't quite see how the code achieves this, though I might be missing
> something.
> 
> 
> The alternative would be to dump the registers in a common layout by
> setting ZCR_EL2.LEN permanently ZCR_ELx_LEN_MASK except when actually
> running the guest.  But that may tend to waste a bit of memory, and it
> would be necessary to save the host regs in a side buffer and convert
> them to host layout when exiting the guest.
> 
> This would avoid slowness arising from saving regs immediately after a
> ZCR.LEN write, but would add complexity elsewhere.
> 
>> 
>>> +
>>> +		}
>>> 	} else {
>>> 		__fpsimd_restore_state(&vcpu->arch.ctxt.fp_regs);
>>> 	}
>>> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>>> b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>>> index cce43bfe158f..45c16b81b826 100644
>>> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>>> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
>>> @@ -46,6 +46,19 @@ static inline void __sysreg_save_el1_state(struct
>>> kvm_cpu_context *ctxt)
>>> 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg_par();
>>> 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
>>> 
>>> +	if (!has_vhe()) {
>>> +		u64 reg = read_sysreg(cptr_el2);
>>> +		if (reg & CPTR_EL2_TZ) {
>>> +			write_sysreg(reg & ~CPTR_EL2_TZ, cptr_el2);
>>> +			ctxt_sys_reg(ctxt, ZCR_EL1) = read_sysreg_s(SYS_ZCR_EL1);
>>> +			ctxt_sys_reg(ctxt, ZCR_EL2) = read_sysreg_s(SYS_ZCR_EL2);
>>> +			write_sysreg(reg, cptr_el2);
>>> +		} else {
>>> +			ctxt_sys_reg(ctxt, ZCR_EL1) = read_sysreg_s(SYS_ZCR_EL1);
>>> +			ctxt_sys_reg(ctxt, ZCR_EL2) = read_sysreg_s(SYS_ZCR_EL2);
>> 
>> Why do you need to save ZCR_EL2? It cannot change behind your back, can it?
>> Also, why aren't these registers saved lazily? If TZ is set, the guest
>> cannot
>> possibly have changed ZCR_EL1.
>> 
>>> +		}
>>> +	}
>>> +
>>> 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
>>> 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
>>> 	ctxt_sys_reg(ctxt, SPSR_EL1)	= read_sysreg_el1(SYS_SPSR);
>>> @@ -107,6 +120,19 @@ static inline void
>>> __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
>>> 	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
>>> 	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
>>> 
>>> +	if (!has_vhe()) {
>>> +		u64 reg = read_sysreg(cptr_el2);
>>> +		if (reg & CPTR_EL2_TZ) {
>>> +			write_sysreg(reg & ~CPTR_EL2_TZ, cptr_el2);
>>> +			write_sysreg_s(ctxt_sys_reg(ctxt, ZCR_EL1), SYS_ZCR_EL1);
>>> +			write_sysreg_s(ctxt_sys_reg(ctxt, ZCR_EL2), SYS_ZCR_EL2);
>>> +			write_sysreg(reg, cptr_el2);
>> 
>> Same thing here. If TZ is set, there is no point in restoring these
>> registers,
>> and it should be delegated to the point where you actually restore the bulk
>> of the SVE state.
> 
> Ack
> 
>> 
>>> +		} else {
>>> +			write_sysreg_s(ctxt_sys_reg(ctxt, ZCR_EL1), SYS_ZCR_EL1);
>>> +			write_sysreg_s(ctxt_sys_reg(ctxt, ZCR_EL2), SYS_ZCR_EL2);
>>> +		}
>>> +	}
>>> +
>>> 	if (!has_vhe() &&
>>> 	    cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
>>> 	    ctxt->__hyp_running_vcpu) {
>>> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c
>>> b/arch/arm64/kvm/hyp/nvhe/switch.c
>>> index 8ae8160bc93a..bffd78203448 100644
>>> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
>>> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
>>> @@ -41,7 +41,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>>> 
>>> 	val = CPTR_EL2_DEFAULT;
>>> 	val |= CPTR_EL2_TTA | CPTR_EL2_TZ | CPTR_EL2_TAM;
>>> -	if (!update_fp_enabled(vcpu)) {
>>> +
>>> +	if (update_fp_enabled(vcpu)) {
>>> +		if (vcpu_has_sve(vcpu))
>>> +			val &= ~CPTR_EL2_TZ;
>>> +	} else {
>> 
>> nit: please don't invert conditions like this.
Done, just wanted to have the same flow as in vhe code.

>> 
>>> 		val |= CPTR_EL2_TFP;
>>> 		__activate_traps_fpsimd32(vcpu);
>>> 	}
>>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>>> index f32490229a4c..3c62fb10a3fe 100644
>>> --- a/arch/arm64/kvm/reset.c
>>> +++ b/arch/arm64/kvm/reset.c
>>> @@ -127,10 +127,6 @@ static int kvm_vcpu_enable_sve(struct kvm_vcpu *vcpu)
>>> 	if (!system_supports_sve())
>>> 		return -EINVAL;
>>> 
>>> -	/* Verify that KVM startup enforced this when SVE was detected: */
>>> -	if (WARN_ON(!has_vhe()))
>>> -		return -EINVAL;
>>> -
>>> 	vcpu->arch.sve_max_vl = kvm_sve_max_vl;
>>> 
>>> 	/*
>>> @@ -168,6 +164,10 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu
>>> *vcpu)
>>> 		return -ENOMEM;
>>> 
>>> 	vcpu->arch.sve_state = buf;
>>> +
>>> +	if (!has_vhe())
>>> +		__vcpu_sys_reg(vcpu, ZCR_EL2) = sve_vq_from_vl(vl) - 1;
>> 
>> As I understand it, this is the only time where ZCR_EL2 is ever set
>> for a given guest. It only has to be restored on guest entry, never
>> saved back.
> 
> [...]
>> 
>> Thanks,
>> 
>>        M.
> 
> (Except that the hyp code may need to mess with ZCR_EL2 as outlined
> above.  But ack that the value that needs to be in ZCR_EL2 while the
> guest runs should never change from the value determined here IIUC.)
> 
> Cheers
> ---Dave