[RFC PATCH 0/6] KVM: arm64: Errata management for VM Live migration

Marc Zyngier maz at kernel.org
Fri Oct 11 03:37:24 PDT 2024


Hi Shameer,

Thanks for getting the ball rolling on this one, much appreciated.

On Fri, 11 Oct 2024 08:50:47 +0100,
Shameer Kolothum <shameerali.kolothum.thodi at huawei.com> wrote:
> 
> Hi,
> 
> On ARM64 platforms most of the errata workarounds are based on CPU
> MIDR/REVIDR values and a number of these workarounds need to be
> implemented by the Guest kernel as well. This creates a problem when
> Guest needs to be migrated to a platform that differs in these
> MIDR/REVIDR values even if the VMM can come up with a common minimum
> feature list for the Guest using the recently introduced "Writable
> ID registers" support.
> 
> (This is roughly based on a discussion I had with Marc and Oliver
> at KVM forum. Marc outlined his idea for a solution and this is an
> attempt to implement it. Thanks to both and I take all the blame
> if this is nowhere near what is intended/required)
> 
> This RFC proposes a solution to handle the above issue by introducing
> the following,
> 
> 1. A new VM IOCTL,
>    KVM_ARM_SET_MIGRN_TARGET_CPUS  _IOW(KVMIO,  0xb7, struct kvm_arm_migrn_cpus)
>    This can be used by the userspace(VMM) to set the target CPUs the
>    Guest will run in its lifetime. See patch #2
> 2. Add hypercall support for Guest kernel to retrieve any migration
>    errata bitmap(ARM_SMCCC_VENDOR_HYP_KVM_MIGRN_ERRATA)
>    The above will return the bitmaps in R0-R3 registers. See patch #4
> 3. The "capability" field in struct arm64_cpu_capabilities is a generated
>    one at present and may get renumbered or reordered. Hence, we can't use
>    this directly for migration errata bitmaps. Instead, introduced
>    "migartion_safe_cap", which has to be set statically for any
>    erratum that needs to be enabled and is safe for migration
>    purposes. See patches 3 & 6.
> 4. Rest of the patches includes the plumbing required to populate the
>    errata bitmap based on the target CPUs set by the VMM and update the
>    system_cap based on it.
> 
> ToDos:-
>   -We still need a way to  handle the error in setting the invariant
>    registers(MIDR/REVIDR/AIDR) during Guest migration. Perhaps we can
>    handle it in userspace?
> -  Possibly we could do better to avoid the additional "migartion_safe_cap" use.
>    Suggestions welcome.
>   -There are errata that require more than MIDR/REVIDR, eg: CTR_EL0.
>    How to handle those?
>   -Check for locking requirements if any.
> 
> This is lightly tested on a HiSilicon ARM64 platform.
> 
> Please take a look and let me know your thoughts.

Having eyeballed this very superficially, I think we can do something
simpler, and maybe more future-proof:

- I don't think KVM should be concerned about the description of the
  target CPUs. The hypercall you defined is the right thing to do,
  but the VMM should completely handle it. That's an implementation
  detail, but it would make things much simpler.

- I don't think the "errata bitmap" works. That's a construct that is
  specific to Linux, and that cannot be supported for other OSs. It
  also limits the described issues to those the host knows, instead of
  the guest. The host doesn't have a clue what the guest really wants.
  Really, the guest should have enough information to decide what to
  do based on its own view of the ID registers and the list of CPUs it
  runs on.

- To answer your question about CTR_EL0: KVM should (and does)
  sanitise that register by trapping it. This should be the default
  behaviour for things that need to be mitigated outside of
  MIDR/REVIDR.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list