[PATCH v13 04/12] KVM: guest_memfd: Add slab-allocated inode cache

Mon Oct 27 05:25:23 PDT 2025

On 10/27/2025 4:36 PM, Vlastimil Babka wrote:
> On 10/16/25 19:28, Sean Christopherson wrote:
>> From: Shivank Garg <shivankg at amd.com>
>>
>> Add a dedicated gmem_inode structure and a slab-allocated inode cache for
>> guest memory backing, similar to how shmem handles inodes.
>>
>> This adds the necessary allocation/destruction functions and prepares
>> for upcoming guest_memfd NUMA policy support changes.  Using a dedicated
>> structure will also allow for additional cleanups, e.g. to track flags in
>> gmem_inode instead of i_private.
>>
>> Signed-off-by: Shivank Garg <shivankg at amd.com>
>> Tested-by: Ashish Kalra <ashish.kalra at amd.com>
>> [sean: s/kvm_gmem_inode_info/gmem_inode, name init_once()]
>> Reviewed-by: Ackerley Tng <ackerleytng at google.com>
>> Tested-by: Ackerley Tng <ackerleytng at google.com>
>> Signed-off-by: Sean Christopherson <seanjc at google.com>
> 
> Reviewed-by: Vlastimil Babka <vbabka at suse.cz>
> 
> Some nits below, not critical unless there's resubmit for other reasons:

Hi Vlastimil,

Thank you for the review.

> 
>> @@ -860,13 +917,31 @@ static int kvm_gmem_init_mount(void)
>>  
>>  int kvm_gmem_init(struct module *module)
>>  {
>> +	struct kmem_cache_args args = {
>> +		.align = 0,
> 
> This seems unnecessary as it's implicit.

Ack

>> +		.ctor = kvm_gmem_init_inode_once,
>> +	};
>> +	int ret;
>> +
>>  	kvm_gmem_fops.owner = module;
>> +	kvm_gmem_inode_cachep = kmem_cache_create("kvm_gmem_inode_cache",
>> +						  sizeof(struct gmem_inode),
>> +						  &args, SLAB_ACCOUNT);
>> +	if (!kvm_gmem_inode_cachep)
>> +		return -ENOMEM;
>>  
>> -	return kvm_gmem_init_mount();
>> +	ret = kvm_gmem_init_mount();
>> +	if (ret) {
>> +		kmem_cache_destroy(kvm_gmem_inode_cachep);
>> +		return ret;
>> +	}
>> +	return 0;
>>  }
>>  
>>  void kvm_gmem_exit(void)
>>  {
>>  	kern_unmount(kvm_gmem_mnt);
>>  	kvm_gmem_mnt = NULL;
>> +	rcu_barrier();
> 
> Is it because VFS can do call_rcu() with something that ends up with
> kvm_gmem_free_inode()? Because nothing in this patch does that directly,
> maybe worth a comment?

Yes, exactly. I discovered this race condition while debugging a bug that 
occurred during kvm_amd module unload after running gmem backed VM.

More details here:
https://lore.kernel.org/linux-mm/e7f7703d-fe76-4ab2-bef4-8d4c54da03ad@amd.com

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 427c0acee9d7..e1f69747fc84 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -969,7 +969,6 @@ static int kvm_gmem_init_mount(void)
 int kvm_gmem_init(struct module *module)
 {
 	struct kmem_cache_args args = {
-		.align = 0,
 		.ctor = kvm_gmem_init_inode_once,
 	};
 	int ret;
@@ -993,6 +992,15 @@ void kvm_gmem_exit(void)
 {
 	kern_unmount(kvm_gmem_mnt);
 	kvm_gmem_mnt = NULL;
+
+	/*
+	 * Wait for all pending RCU callbacks to complete before destroying
+	 * the inode cache. The VFS layer use call_rcu() during inode
+	 * eviction (via evict_inodes() -> destroy_inode() -> call_rcu()),
+	 * which eventually calls kvm_gmem_free_inode().
+	 * We must ensure all such callbacks have finished before
+	 * kmem_cache_destroy() to avoid issues with the kmem cache.
+	 */
 	rcu_barrier();
 	kmem_cache_destroy(kvm_gmem_inode_cachep);
 }


> 
>> +	kmem_cache_destroy(kvm_gmem_inode_cachep);
>>  }
>