KVM/arm64: Guest ABI changes do not appear rollback-safe

Fri Oct 1 04:43:38 PDT 2021

On Thu, 30 Sep 2021 18:24:23 +0100,
Oliver Upton <oupton at google.com> wrote:
> 
> Hey Marc,
> 
> On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz at kernel.org> wrote:
> >
> > Hi Oliver,
> >
> > On Wed, 29 Sep 2021 19:22:05 +0100,
> > Oliver Upton <oupton at google.com> wrote:
> > >
> > > I have some lingering thoughts on this subject since we last spoke and
> > > wanted to discuss.
> > >
> > > I'm having a hard time figuring out how a VMM should handle a new
> > > hypercall identity register introduced on a newer kernel. In order to
> > > maintain guest ABI, the VMM would need to know about that register and
> > > zero it when restoring an older guest.
> >
> > Just as it would need to be able to discover any new system register
> > exposed by default, as it happens at all times. Which is why we have a
> > way to discover all the registers, architected or not.
> >
> > > Perhaps instead we could reserve a range of firmware registers as the
> > > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > > default, encouraging userspace to zero these registers away for older
> > > VMs but still allowing an old userspace to pick up new KVM features.
> > > Doing so would align the hypercall identity registers with the feature
> > > ID registers from the architecture.
> >
> > The range already exists in the form of the "coprocessor" 0x14. I
> > don't see the need to expose it as RAZ/WI, however. If userspace
> > doesn't know about how this pseudo-register works, it won't be able to
> > program it anyway.
> >
> > I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> > default so that they don't UNDEF at runtime. The meaning of a RAZ
> > id-register is also well defined (feature not implemented), and the
> > CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> > mechanism.
> >
> > Our firmware registers don't follow the same rules. Userspace can
> > write to them, and there is no such "not implemented" rule (case in
> > point, PSCI). We also have a separate enumeration mechanism
> > (GET_ONE_REG), which is (more or less) designed for userspace to find
> > what is implemented.
> >
> > For these reasons, I don't immediately see the point of advertising a
> > set of registers ahead of time, before userspace grows an
> > understanding of what these registers mean.
> 
> Supposing we don't preallocate some hypercall ID registers, how can we
> safely migrate a guest from an older kernel to newer kernel? Ideally,
> we would preserve the hypercall feature set across the migration which
> could work for a while with the first set of registers that get
> defined, but whenever a new hypercall firmware register comes along
> then the VMM will be clueless to the new ABI.

My expectations were that whenever userspace writes a set of firmware
register, all the defaults are invalidated. For example say that
host-A know about a single hypercall register, while host-B knows
about two. Upon writing to the first register, the host would clear
the set of available services in the second one. If userspace
eventually writes there, the value would stick if valid.

Also, remember that pseudo-registers don't have to be 64bit. We could
define a new class of hypercall-specific registers that would be much
wider, and thus have a larger write granularity (KVM supports anything
from 8 to 2048 bits). This would make it pretty easy to implement the
above.

> Fundamentally, I don't think userspace should need a patch to preserve
> ABI on a newer kernel. Despite that, it would seem that userspace will
> need to learn of any firmware registers that control hypercall
> features which come after the initial set that gets proposed. If
> KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
> architectural feature ID registers) from other parts of the vCPU
> state, it would be clear to what registers to zero on a newer kernel.
> Apologies if it is distracting to mention the feature ID registers
> here, but both are on my mind currently and want to make sure there is
> some consistency in how features get handled on newer kernels,
> architected or not.

The problem I see is that we will always need to grow the HC register
space one way or another, no matter how many we reserve. Your approach
only works if we don't exceed that particular range. Maybe it will
never be a problem, but it remains that this is not scalable.

If we wanted to be safe, we'd reserve the whole of the possible space
as defined by the SMCCC spec. Given that we currently have two HC
spaces (the ARM-controlled one, and the KVM-specific one), the
function space being 16bits in each case, that's 16kB worth of zeroes
that userspace has to save/restore at all times... I'm not overly
enthusiastic.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.