[PATCH RFC] KVM: arm64: allow ID_MMFR4_EL1 to be writable

Oliver Upton oliver.upton at linux.dev
Thu May 2 09:45:53 PDT 2024


On Thu, May 02, 2024 at 03:40:38PM +0100, Russell King (Oracle) wrote:
> On Wed, May 01, 2024 at 06:59:17PM +0000, Oliver Upton wrote:
> > On Wed, May 01, 2024 at 07:08:05PM +0100, Russell King (Oracle) wrote:
> > > Yes, it did strike me as odd, since the description seems to imply that
> > > XNX affects EL2, which the VM wouldn't have access to. So I'm not sure
> > > why we don't just force it to zero.
> > 
> > Probably because we failed to catch it in the first place and setting to
> > 0 now would be even more UAPI breakage. Meh :-/ I don't see any immediate
> > issues with the patch, especially since it is fixing a genuine UAPI
> > breakage in KVM.
> 
> I think the only two ways around this would be to:
> 
> 1) teach QEMU about the contents of these registers, with which fields
>    in these registers can be ignored when reloading a VMs context.
> 
> 2) allow userspace to write to the XNX field such that it can be set
>    to values seen with previous kernels (thus allowing at least one-
>    way migration.)
> 
> (1) has the advantage that reloading a VM state on older vs newer
> kernels can work in either direction, whereas (2) would only work
> for state saved on an older kernel loaded onto a newer kernel.

Yeah, so this is something that has affected my employer as well.

I think (1) should only be expected of VMMs that want rollback safety,
i.e. the ability to migrate state back to an older kernel. Our userspace
initializes vCPUs from a fixed set of feature ID register values that
prevents VMs on new kernels from picking up new CPU features.

It is quite tedious, but necessary as rollback safety is very much a
non-goal of the KVM UAPI.

OTOH, in cases where KVM screws up and breaks UAPI, the kernel needs to
do something special to accept the previously-advertised state even if
it were nonsensical.

For example, there was a bug where KVM advertised an IMP DEF PMU to VMs
even though the only thing KVM virtualizes is PMUv3. We fixed it in
commit f90f9360c3d7 ("KVM: arm64: Rewrite IMPDEF PMU version as NI") by
accepting the old value in the ioctl and changing the field to NI
internally.

I dislike these sort of hacks, but when we're caught between upholding
UAPI and the architecture it seems to be the best option. I wonder if an
approach similar to this would be sufficient to address the SPE change
that you noticed.

> I've been bitten before with KVM differences between kernel versions
> in the past - where the number of registers that userspace sees has
> changed despite being on the same hardware.

This is intended behavior, as VMs are initialized to the maximum feature
set KVM is able to support. Forward-compatibility for the set of exposed
registers is tested, see the get-reg-list selftest.

> I'm now wondering what
> testing goes on to validate this part of the UAPI across different
> kernel versions on the same hardware.

We may've been a bit more relaxed in the past with this, but in recent
history we've been careful about preserving UAPI. On top of that, we now
have some generalized infrastructure for dealing with these things by
way of the 'writable' / mutable ID register work.

Although it isn't precisely what you're looking for, the set_id_regs
selftest ensures we at least accept features that are valid for the
underlying HW platform and explicitly tests downgrades.

> I did knock up a test program that dumped the list of registers so
> that one could trivially diff the output between various kernels.
> Maybe I need to extend that to dump the register values themselves,
> and then maybe we need to find a way to get some kind of automated
> testing setup to highlight differences. (something maybe kernelci
> could add?)

Given the absolutely massive test matrix of implementations and kernel
versions I question our ability to support this. Additionally the only
thing we'd care about upstream is the unsafe removal of a feature.

Nevertheless, a starting point could be to drop some pr_info() into
set_id_regs.c to print the starting value of the ID registers which
could be diffed.

-- 
Thanks,
Oliver



More information about the linux-arm-kernel mailing list