KVM/arm64: Guest ABI changes do not appear rollback-safe

Wed Aug 25 03:02:28 PDT 2021

On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz at kernel.org> wrote:
> > Exposing new hypercalls to guests in this manner seems very unsafe to
> > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > N+1, which brings in the new 'widget' hypercall. Guests are live
> > migrated onto the N+1 kernel, but the operator finds a defect that
> > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > Any guests that discovered the 'widget' hypercall are likely going to
> > get fussy _very_ quickly on the old kernel.
>
> This goes against what we decided to support for the *only* publicly
> available VMM that cares about save/restore, which is that we only
> move forward and don't rollback.

Ah, I was definitely missing this context. Current behavior makes much
more sense then.

> Hypercalls are the least of your
> worries, and there is a whole range of other architectural features
> that will have also appeared/disappeared (your own CNTPOFF series is a
> glaring example of this).

Isn't that a tad bit different though? I'll admit, I'm just as guilty
with my own series forgetting to add a KVM_CAP (oops), but it is in my
queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
takes up a new KVM UAPI, it is up to the user to run on a new kernel.

My concerns are explicitly with the 'under the nose' changes, where
KVM modifies the guest feature set without userspace opting in. Based
on your comment, though, it would appear that other parts of KVM are
affected too. It doesn't have to be rollback safety, either. There may
simply be a hypercall which an operator doesn't want to give its
guests, and it needs a way to tell KVM to hide it.

> > Have I missed something blatantly obvious, or do others see this as an
> > issue as well? I'll reply with an example of adding opt-out for PTP.
> > I'm sure other hypercalls could be handled similarly.
>
> Why do we need this? For future hypercalls, we could have some buy-in
> capabilities. For existing ones, it is too late, and negative features
> are just too horrible.

Oh, agreed on the nastiness. Lazy hack to realize the intended
functional change..

> For KVM-specific hypercalls, we could get the VMM to save/restore the
> bitmap of supported functions. That would be "less horrible". This
> could be implemented using extra "firmware pseudo-registers" such as
> the ones described in Documentation/virt/kvm/arm/psci.rst.

This seems more reasonable, especially since we do this for migrating
the guest's PSCI version.

Alternatively, I had thought about using a VM attribute, given the
fact that it is non-architectural information and we avoid ABI issues
in KVM_GET_REG_LIST without buy-in through a KVM_CAP.

> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.