[PATCH v2 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP
Marc Zyngier
maz at kernel.org
Wed May 5 23:33:22 PDT 2021
On Wed, 05 May 2021 17:46:51 +0100,
Marc Zyngier <maz at kernel.org> wrote:
>
> Hi Zenghui,
>
> On Wed, 05 May 2021 15:23:02 +0100,
> Zenghui Yu <yuzenghui at huawei.com> wrote:
> >
> > Hi Marc,
> >
> > On 2020/11/3 0:40, Marc Zyngier wrote:
> > > In an effort to remove the vcpu PC manipulations from EL1 on nVHE
> > > systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
> > > to increment PC post emulation is now signalled via a flag in the
> > > vcpu structure.
> > >
> > > Signed-off-by: Marc Zyngier <maz at kernel.org>
> >
> > [...]
> >
> > > @@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> > > __load_guest_stage2(vcpu->arch.hw_mmu);
> > > __activate_traps(vcpu);
> > > + __adjust_pc(vcpu);
> >
> > If the INCREMENT_PC flag was set (e.g., for WFx emulation) while we're
> > handling PSCI CPU_ON call targetting this VCPU, the *target_pc* (aka
> > entry point address, normally provided by the primary VCPU) will be
> > unexpectedly incremented here. That's pretty bad, I think.
>
> How can you online a CPU using PSCI if that CPU is currently spinning
> on a WFI? Or is that we have transitioned via userspace to perform the
> vcpu reset? I can imagine it happening in that case.
>
> > This was noticed with a latest guest kernel, at least with commit
> > dccc9da22ded ("arm64: Improve parking of stopped CPUs"), which put the
> > stopped VCPUs in the WFx loop. The guest kernel shouted at me that
> >
> > "CPU: CPUs started in inconsistent modes"
>
> Ah, the perks of running guests with "quiet"... Well caught.
>
> > *after* rebooting. The problem is that the secondary entry point was
> > corrupted by KVM as explained above. All of the secondary processors
> > started from set_cpu_boot_mode_flag(), with w0=0. Oh well...
> >
> > I write the below diff and guess it will help. But I have to look at all
> > other places where we adjust PC directly to make a right fix. Please let
> > me know what do you think.
> >
> >
> > Thanks,
> > Zenghui
> >
> > ---->8----
> > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > index 956cdc240148..ed647eb387c3 100644
> > --- a/arch/arm64/kvm/reset.c
> > +++ b/arch/arm64/kvm/reset.c
> > @@ -265,7 +265,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> > if (vcpu->arch.reset_state.be)
> > kvm_vcpu_set_be(vcpu);
> >
> > + /*
> > + * Don't bother with the KVM_ARM64_INCREMENT_PC flag while
> > + * using this version of __adjust_pc().
> > + */
> > *vcpu_pc(vcpu) = target_pc;
> > + vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
Actually, this is far worse than it looks, and this only papers over
one particular symptom. We need to resolve all pending PC updates
*before* returning to userspace, or things like live migration can
observe an inconsistent state.
I'll try and cook something up.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list