[PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags

Mon Mar 31 07:56:43 PDT 2025

On Wed, Mar 26, 2025 at 11:24:32AM -0700, Sean Christopherson wrote:
> > I don't know how you reconcile the lack of host mapping and cache
> > maintenance. The latter cannot take place without the former.
> 
> I assume cache maintenance only requires _a_ mapping to the physical memory.
> With guest_memfd, KVM has the pfn (which happens to always be struct page memory
> today), and so can establish a VA=>PA mapping as needed.

This is why we are forcing FWB in this work, because we don't have a
VA mapping and KVM doesn't have the code to create one on demand.

IMHO, I strongly suspect that all CCA capable ARM systems will have
FWB, so it may make the most sense to continue that direction when
using guest_memfd. Though I don't know what that would mean for pkvm..

> > > I agree a capability is mandatory if we're adding a memslot flag, but I don't
> > > think it's mandatory if this is all handled through kernel plumbing.
> > 
> > It is mandatory, full stop. Otherwise, userspace is able to migrate a
> > VM from an FWB host to a non-FWB one, start the VM, blow up on the
> > first page fault. That's not an acceptable outcome.

It is fine if you add a protective check during memslot creation. If
qemu asks for a memslot for a cachable VFIO VMA without FWB support
then fail it immediately, and that will safely abort the migration. Do
not delay until page fault time, though still check again at page
fault time.

This is good enough for VFIO device live migration as "try and fail"
broadly matches the other HW compatability checks VFIO device live
migration does today.

> Ah, the safety I'm talking about is the CMO requirement.  IIUC, not doing CMOs
> if the memory is cacheable could result in data corruption, i.e. would be a safety
> issue for the host.  But I missed that you were proposing that the !FWB behavior
> would be to force device mappings.

It only forces device mappings on the VM side, so you still have the
safety issue on the host side where a the VM will see the physical
contents of the memory but the hypervisor doesn't flush or otherwise
to sychronize. Ie data could leak across VM A to B because the cache
was not flushed and A's data is still in physical.

> Agreed, but that doesn't require a memslot flag.  A capability to enumerate that
> KVM can do cacheable mappings for PFNMAP memory would suffice.  And if we want to
> have KVM reject memslots that are cachaeable in the VMA, but would get device in
> stage-2, then we can provide that functionality through the capability, i.e. let
> userspace decide if it wants "fallback to device" vs. "error on creation" on a
> per-VM basis.

I think we must block "fallback to device" as a security fix. So you
only get error on creation an option..

If that means some scenarios start to fail, and we consider that as
important to fix, the fix is to have the S2 be cachable and add the
missing CMOs.

Jason