[PATCH 1/5] KVM: arm64: Grab KVM MMU write lock in kvm_arch_flush_shadow_all()

Tue May 5 11:16:50 PDT 2026

On Tue, May 05, 2026, James Houghton wrote:
> On Tue, May 5, 2026 at 10:05 AM Sean Christopherson <seanjc at google.com> wrote:
> > There are more issues.  kvm->arch.mmu.split_page_cache can be freed by
> > kvm_arch_commit_memory_region(), which holds slots_lock and slots_arch_lock,
> > but not mmu_lock.
> 
> Thanks. I also noticed that kvm->arch.mmu.split_page_cache is
> documented as being protected by kvm->slots_lock; we should be holding
> it here. But we cannot take it here because we are already holding the
> KVM srcu lock.
> 
> > IMO, the handling of kvm->arch.mmu.split_page_cache should be reworked.  I don't
> > entirely get the motivation for aggressively freeing the cache.  The cache will
> > only be filled if KVM actually does eager page splitting, so it's not like KVM is
> > burning pages for setups that will never use the cache.
> >
> > Maybe I'm underestimating how many pages arm64 needs in the worst case scenario?
> > (I can't follow the math, too many macros).  But if KVM is configuring the cache
> > with a capacity that's _so_ high that the "wasted" memory is problematic, then we
> > probably should we revisit the capacity and algorithm.  E.g. if KVM is splitting
> > from 1GiB => 4KiB in a single pass (I can't tell if KVM does this on arm64), then
> > we could break that into a 1GiB => 2MiB => 4KiB sequence.
> 
> I'm not sure I've fully understood the point you're making, but I
> *think* we can just drop the
>     kvm_mmu_free_memory_cache(&kvm->arch.mmu.split_page_cache);
> line from kvm_uninit_stage2_mmu(). It will get freed when the VM is
> destroyed anyway.

It's not that simple.  KVM arm64 allows userspace to reconfigure the capacity of
the cache via KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE.  kvm_vm_ioctl_enable_cap()
currently allows userspace to do that so long as there are no memslots.
__kvm_mmu_topup_memory_cache() will (rightly) yell and fail if it's called with
the "wrong" capacity, so we'd need to sort that out.

The other issue is that it's not clear to me what happens for large "chunk" sizes.
If KVM is splitting from 1GiB (or whatever huge-hugepage sizes are supported on
arm64) all the way to 4KiB, e.g. to optimize against break-before-make, then the
capacity of the cache could be significant, e.g. MiB of memory or worse.  My read
of things is that purging the cache when dirty logging is disabled is a guard
against consuming too much memory when the chunk size is large.