[PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP

Wed May 5 09:46:51 PDT 2021

Hi Zenghui,

On Wed, 05 May 2021 15:23:02 +0100,
Zenghui Yu <yuzenghui at huawei.com> wrote:
> 
> Hi Marc,
> 
> On 2020/11/3 0:40, Marc Zyngier wrote:
> > In an effort to remove the vcpu PC manipulations from EL1 on nVHE
> > systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
> > to increment PC post emulation is now signalled via a flag in the
> > vcpu structure.
> > 
> > Signed-off-by: Marc Zyngier <maz at kernel.org>
> 
> [...]
> 
> > @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> >  	__load_guest_stage2(vcpu->arch.hw_mmu);
> >  	__activate_traps(vcpu);
> >  +	__adjust_pc(vcpu);
> 
> If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're
> handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka
> entry point address, normally provided by the primary VCPU) will be
> unexpectedly incremented here. That's pretty bad, I think.

How can you online a CPU using PSCI if that CPU is currently spinning
on a WFI? Or is that we have transitioned via userspace to perform the
vcpu reset? I can imagine it happening in that case.

> This was noticed with a latest guest kernel, at least with commit
> dccc9da22ded ("arm64: Improve parking of stopped CPUs"), which put the
> stopped VCPUs in the WFx loop. The guest kernel shouted at me that
> 
> 	"CPU: CPUs started in inconsistent modes"

Ah, the perks of running guests with "quiet"... Well caught.

> *after* rebooting. The problem is that the secondary entry point was
> corrupted by KVM as explained above. All of the secondary processors
> started from set_cpu_boot_mode_flag(), with w0=0. Oh well...
> 
> I write the below diff and guess it will help. But I have to look at all
> other places where we adjust PC directly to make a right fix. Please let
> me know what do you think.
> 
> 
> Thanks,
> Zenghui
> 
> ---->8----
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 956cdc240148..ed647eb387c3 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -265,7 +265,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		if (vcpu->arch.reset_state.be)
>  			kvm_vcpu_set_be(vcpu);
> 
> +		/*
> +		 * Don't bother with the KVM_ARM64_INCREMENT_PC flag while
> +		 * using this version of __adjust_pc().
> +		 */
>  		*vcpu_pc(vcpu) = target_pc;
> +		vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;

I think you need to make it a lot stronger: any PC-altering flag will
do the wrong thing here. I'd go and clear all the exception bits:

Thanks,

	M.

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 956cdc240148..54913612d602 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -265,6 +265,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		if (vcpu->arch.reset_state.be)
 			kvm_vcpu_set_be(vcpu);
 
+		/*
+		 * We're reseting the CPU, make sure there is no
+		 * pending exception or other PC-altering event.
+		 */
+		vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
+				      KVM_ARM64_EXCEPT_MASK);
 		*vcpu_pc(vcpu) = target_pc;
 		vcpu_set_reg(vcpu, 0, vcpu->arch.reset_state.r0);
 

-- 
Without deviation from the norm, progress is not possible.