[RFC PATCH] KVM: Ignore MMU notifiers for guest_memfd-only memslots

Mon Jun 15 08:52:44 PDT 2026

For guest_memfd-only memslots (kvm_memslot_is_gmem_only() is true), the
memory provider for the virtual machine is the guest_memfd file, not the
userspace mapping. Faults are resolved using the guest_memfd page cache,
and the permissions for the secondary MMU mapping depends exclusively on
the memslot (i.e, if the memslot is read-only). How userspace happens to
have the memory mmaped at fault time, or even if the memory is mapped at
all into userspace, is not taken into consideration.

guest_memfd memory is not evictable, is not movable and there's no backing
storage. Once memory is allocated for an offset in guest_memfd file, the
offset will not change, and that memory is not freed unless userspace
explicitly punches a hole in the file. As a result, memory reclaim, page
migration, page aging and dirty page tracking for the userspace mapping
serve little purpose.

Despite this, KVM's MMU notifiers still modify the secondary MMU page
tables, similar to ordinary memslots, only for the same memory to be
remapped next time a guest accesses it. Make the disconnect between the
user mapping and the secondary MMU page tables explicit by ignoring the MMU
notifiers for guest_memfd-only memslots.

Signed-off-by: Alexandru Elisei <alexandru.elisei at arm.com>
---
The only theoretical instance where the MMU notifiers are invoked for the
userspace mapping of a guest_memfd-only memslot that I was able to find was
automatic NUMA balancing with a non-NULL NUMA policy for the guest_memfd
file. I wasn't able to test it in practice. Also my knowledge of MM is very
limited, so there might be other cases where it happens, or I might be
wrong and today the MMU notifiers are never invoked.

Either way, when and if it happens, having memory unmapped from the
seconday MMU in the case of guest_memfd-only memslot is at most a
performance issue (it causes unnecessary guest faults), but I wanted to
start a conversation about this because having memory that stays mapped at
stage 2 (unless userspace explicitly unmaps it from the VM) is needed for a
Arm feature (called SPE, Statistical Profiling Extension) that I'm working
to upstream. This patch aims to provide the guarantee that memory won't be
unmapped from the secondary MMU behind the VMMs back, which is what happens
for non guest_memfd memslots.

 virt/kvm/kvm_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 881f92d7a469..8c4158996928 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -592,6 +592,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm,
 			unsigned long hva_start, hva_end;
 
 			slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]);
+
+			if (kvm_slot_has_gmem(slot) && kvm_memslot_is_gmem_only(slot))
+				continue;
+
 			hva_start = max_t(unsigned long, range->start, slot->userspace_addr);
 			hva_end = min_t(unsigned long, range->end,
 					slot->userspace_addr + (slot->npages << PAGE_SHIFT));

base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
-- 
2.54.0