[PATCH v5 4/4] KVM: mmu: remove over-aggressive warnings

Fri Jan 7 08:31:33 PST 2022

On Fri, Jan 07, 2022, David Stevens wrote:
> > > These are the type of pages which KVM is currently rejecting. Is this
> > > something that KVM can support?
> >
> > I'm not opposed to it.  My complaint is that this series is incomplete in that it
> > allows mapping the memory into the guest, but doesn't support accessing the memory
> > from KVM itself.  That means for things to work properly, KVM is relying on the
> > guest to use the memory in a limited capacity, e.g. isn't using the memory as
> > general purpose RAM.  That's not problematic for your use case, because presumably
> > the memory is used only by the vGPU, but as is KVM can't enforce that behavior in
> > any way.
> >
> > The really gross part is that failures are not strictly punted to userspace;
> > the resulting error varies significantly depending on how the guest "illegally"
> > uses the memory.
> >
> > My first choice would be to get the amdgpu driver "fixed", but that's likely an
> > unreasonable request since it sounds like the non-KVM behavior is working as intended.
> >
> > One thought would be to require userspace to opt-in to mapping this type of memory
> > by introducing a new memslot flag that explicitly states that the memslot cannot
> > be accessed directly by KVM, i.e. can only be mapped into the guest.  That way,
> > KVM has an explicit ABI with respect to how it handles this type of memory, even
> > though the semantics of exactly what will happen if userspace/guest violates the
> > ABI are not well-defined.  And internally, KVM would also have a clear touchpoint
> > where it deliberately allows mapping such memslots, as opposed to the more implicit
> > behavior of bypassing ensure_pfn_ref().
> 
> Is it well defined when KVM needs to directly access a memslot?

Not really, there's certainly no established rule.

> At least for x86, it looks like most of the use cases are related to nested
> virtualization, except for the call in emulator_cmpxchg_emulated.

The emulator_cmpxchg_emulated() will hopefully go away in the nearish future[*].
Paravirt features that communicate between guest and host via memory is the other
case that often maps a pfn into KVM.

> Without being able to specifically state what should be avoided, a flag like
> that would be difficult for userspace to use.

Yeah :-(  I was thinking KVM could state the flag would be safe to use if and only
if userspace could guarantee that the guest would use the memory for some "special"
use case, but hadn't actually thought about how to word things.

The best thing to do is probably to wait for for kvm_vcpu_map() to be eliminated,
as described in the changelogs for commits:

  357a18ad230f ("KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache")
  7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status")

Once that is done, everything in KVM will either access guest memory through the
userspace hva, or via a mechanism that is tied into the mmu_notifier, at which
point accessing non-refcounted struct pages is safe and just needs to worry about
not corrupting _refcount.