[PATCH v3 2/3] KVM: arm64: mixed-width check should be skipped for uninitialized vCPUs

Thu Mar 3 08:10:59 PST 2022

Reiji,

Please add a cover letter to your patches. It actually is important to
track the changes as well as being an anchor in my email client.

On Thu, 03 Mar 2022 03:54:07 +0000,
Reiji Watanabe <reijiw at google.com> wrote:
> 
> KVM allows userspace to configure either all EL1 32bit or 64bit vCPUs
> for a guest.  At vCPU reset, vcpu_allowed_register_width() checks
> if the vcpu's register width is consistent with all other vCPUs'.
> Since the checking is done even against vCPUs that are not initialized
> (KVM_ARM_VCPU_INIT has not been done) yet, the uninitialized vCPUs
> are erroneously treated as 64bit vCPU, which causes the function to
> incorrectly detect a mixed-width VM.
> 
> Introduce KVM_ARCH_FLAG_EL1_32BIT and KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED
> bits for kvm->arch.flags.  A value of the EL1_32BIT bit indicates that
> the guest needs to be configured with all 32bit or 64bit vCPUs, and
> a value of the REG_WIDTH_CONFIGURED bit indicates if a value of the
> EL1_32BIT bit is valid (already set up). Values in those bits are set at
> the first KVM_ARM_VCPU_INIT for the guest based on KVM_ARM_VCPU_EL1_32BIT
> configuration for the vCPU.
> 
> Check vcpu's register width against those new bits at the vcpu's
> KVM_ARM_VCPU_INIT (instead of against other vCPUs' register width).
> 
> Fixes: 66e94d5cafd4 ("KVM: arm64: Prevent mixed-width VM creation")
> Signed-off-by: Reiji Watanabe <reijiw at google.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 25 +++++++++++------
>  arch/arm64/include/asm/kvm_host.h    |  8 ++++++
>  arch/arm64/kvm/arm.c                 | 41 ++++++++++++++++++++++++++++
>  arch/arm64/kvm/reset.c               |  8 ------
>  4 files changed, 65 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d62405ce3e6d..f4f960819888 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -20,6 +20,7 @@
>  #include <asm/ptrace.h>
>  #include <asm/cputype.h>
>  #include <asm/virt.h>
> +#include <asm/kvm_mmu.h>

Huh... I wish we didn't drag that one here, it is eventually going to
hurt...

>  
>  #define CURRENT_EL_SP_EL0_VECTOR	0x0
>  #define CURRENT_EL_SP_ELx_VECTOR	0x200
> @@ -45,7 +46,14 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
>  
>  static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>  {
> -	return !(vcpu->arch.hcr_el2 & HCR_RW);
> +	struct kvm *kvm;
> +
> +	kvm = is_kernel_in_hyp_mode() ? kern_hyp_va(vcpu->kvm) : vcpu->kvm;

Errr... On first approximation, this is the wrong way around. A VHE
kernel doesn't need any repainting of the address, while a nVHE kernel
does. Even more, a bit of context:

static inline bool is_kernel_in_hyp_mode(void)
{
	return read_sysreg(CurrentEL) == CurrentEL_EL2;
}

So not only the expression is the wrong way around, but it *cannot*
distinguish VHE and nVHE when running at EL2. You're just lucky that
the two bugs (on a single line) cancel each others.

The only sane way to write this is to *not* look at the mode you're
running in. kern_hyp_va() is designed to be nop'ed out on VHE.

> +
> +	WARN_ON_ONCE(!test_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED,
> +			       &kvm->arch.flags));
> +
> +	return test_bit(KVM_ARCH_FLAG_EL1_32BIT, &kvm->arch.flags);
>  }

Given that this is used on the vcpu switch fast path at least twice
per run, we need something better. You probably want to offer
different primitives depending on the context:

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index d62405ce3e6d..daea0885c28d 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -43,10 +43,22 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 
 void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
 
+#if defined (__KVM_VHE_HYPERVISOR__) || defined (__KVM_NVHE_HYPERVISOR__)
 static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
 {
 	return !(vcpu->arch.hcr_el2 & HCR_RW);
 }
+#else
+static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
+
+	WARN_ON_ONCE(!test_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED,
+			       &kvm->arch_flags));
+
+	return test_bit(KVM_ARCH_FLAG_EL1_32BIT, &kvm->arch.flags);
+}
+#endif
 
as you are guaranteed to have configured the width of the vcpu by the
time you hit start messing with it in the context of the hypervisor.

>  
>  static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> @@ -72,15 +80,14 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  		vcpu->arch.hcr_el2 |= HCR_TVM;
>  	}
>  
> -	if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
> +	if (vcpu_el1_is_32bit(vcpu))
>  		vcpu->arch.hcr_el2 &= ~HCR_RW;
> -
> -	/*
> -	 * TID3: trap feature register accesses that we virtualise.
> -	 * For now this is conditional, since no AArch32 feature regs
> -	 * are currently virtualised.
> -	 */
> -	if (!vcpu_el1_is_32bit(vcpu))
> +	else
> +		/*
> +		 * TID3: trap feature register accesses that we virtualise.
> +		 * For now this is conditional, since no AArch32 feature regs
> +		 * are currently virtualised.
> +		 */
>  		vcpu->arch.hcr_el2 |= HCR_TID3;
>  
>  	if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 11a7ae747ded..5cde7f7b5042 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -125,6 +125,14 @@ struct kvm_arch {
>  #define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER	0
>  	/* Memory Tagging Extension enabled for the guest */
>  #define KVM_ARCH_FLAG_MTE_ENABLED			1
> +	/*
> +	 * The guest's EL1 register width.  A value of KVM_ARCH_FLAG_EL1_32BIT
> +	 * bit is valid only when KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED is set.
> +	 * Otherwise, the guest's EL1 register width has not yet been
> +	 * determined yet.
> +	 */
> +#define KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED		2
> +#define KVM_ARCH_FLAG_EL1_32BIT				3
>  	unsigned long flags;
>  
>  	/*
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 9a2d240ef6a3..9ac75aa46e2f 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1101,6 +1101,43 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
>  	return -EINVAL;
>  }
>  
> +/*
> + * A guest can have either all EL1 32bit or 64bit vcpus only. It is
> + * indicated by a value of KVM_ARCH_FLAG_EL1_32BIT bit in kvm->arch.flags,
> + * which is valid only when KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED in
> + * kvm->arch.flags is set.
> + * This function checks if the vCPU's register width configuration is
> + * consistent with a value of the EL1_32BIT bit in kvm->arch.flags
> + * when the REG_WIDTH_CONFIGURED bit is set.
> + * Otherwise, the function sets a value of EL1_32BIT bit based on the vcpu's
> + * KVM_ARM_VCPU_EL1_32BIT configuration (and sets the REG_WIDTH_CONFIGURED
> + * bit of kvm->arch.flags).
> + */
> +static int kvm_register_width_check_or_init(struct kvm_vcpu *vcpu)

The naming is positively Java-esque! How about kvm_set_vm_width()
instead? Also, please document the error code.

> +{
> +	bool is32bit;
> +	bool allowed = true;
> +	struct kvm *kvm = vcpu->kvm;
> +
> +	is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
> +
> +	mutex_lock(&kvm->lock);
> +
> +	if (test_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED, &kvm->arch.flags)) {
> +		allowed = (is32bit ==
> +			   test_bit(KVM_ARCH_FLAG_EL1_32BIT, &kvm->arch.flags));
> +	} else {
> +		if (is32bit)
> +			set_bit(KVM_ARCH_FLAG_EL1_32BIT, &kvm->arch.flags);

nit: probably best written as:

		__assign_bit(KVM_ARCH_FLAG_EL1_32BIT, &kvm->arch.flags, is32bit);

> +
> +		set_bit(KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED, &kvm->arch.flags);

Since this is only ever set whilst holding the lock, you can user the
__set_bit() version.

> +	}
> +
> +	mutex_unlock(&kvm->lock);
> +
> +	return allowed ? 0 : -EINVAL;
> +}
> +
>  static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  			       const struct kvm_vcpu_init *init)
>  {
> @@ -1140,6 +1177,10 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  
>  	/* Now we know what it is, we can reset it. */
>  	ret = kvm_reset_vcpu(vcpu);
> +
> +	if (!ret)
> +		ret = kvm_register_width_check_or_init(vcpu);

Why is that called *after* resetting the vcpu, which itself relies on
KVM_ARM_VCPU_EL1_32BIT, which we agreed to get rid of as much as
possible?

	M.

-- 
Without deviation from the norm, progress is not possible.