[PATCH v11 10/16] KVM: guest_memfd: Add flag to remove from direct map

Nikita Kalyazin kalyazin at amazon.com
Fri Apr 10 08:30:36 PDT 2026



On 23/03/2026 21:15, Ackerley Tng wrote:
> "Kalyazin, Nikita" <kalyazin at amazon.co.uk> writes:
> 
>>
>> [...snip...]
>>
>>   static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>>   {
>>        struct inode *inode = file_inode(vmf->vma->vm_file);
>>        struct folio *folio;
>>        vm_fault_t ret = VM_FAULT_LOCKED;
>> +     int err;
>>
>>        if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
>>                return VM_FAULT_SIGBUS;
>> @@ -418,6 +454,14 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
>>                folio_mark_uptodate(folio);
>>        }
>>
>> +     if (kvm_gmem_no_direct_map(folio_inode(folio))) {
>> +             err = kvm_gmem_folio_zap_direct_map(folio);
>> +             if (err) {
>> +                     ret = vmf_error(err);
>> +                     goto out_folio;
>> +             }
>> +     }
>> +
>>        vmf->page = folio_file_page(folio, vmf->pgoff);
>>
> 
> Sashiko pointed out that kvm_gmem_populate() might try and write to
> direct-map-removed folios, but I think that's handled because populate
> will first try and GUP folios, which is already blocked for
> direct-map-removed folios.

As far as I can see, it is a valid issue as populate only GUPs the 
source pages, not gmem.  I think this is similar to what was discussed 
about TDX at some point and decided to exclude TDX support [1].  I 
followed the same path and excluded SEV-SNP in the patch 8 [2].  I kept 
your and David's "Reviewed-by:" for that patch, but please let me know 
if this makes you change your minds.

[1] https://lore.kernel.org/kvm/aWpcDrGVLrZOqdcg@google.com
[2] https://lore.kernel.org/kvm/20260410151746.61150-9-kalyazin@amazon.com

> 
>>   out_folio:
>> @@ -528,6 +572,9 @@ static void kvm_gmem_free_folio(struct folio *folio)
>>        kvm_pfn_t pfn = page_to_pfn(page);
>>        int order = folio_order(folio);
>>
>> +     if (kvm_gmem_folio_no_direct_map(folio))
>> +             kvm_gmem_folio_restore_direct_map(folio);
>> +
>>        kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order));
>>   }
>>
> 
> Sashiko says to invalidate then restore direct map, I think in this case
> it doesn't matter since if the folio needed invalidation, it must be
> private, and the host shouldn't be writing to the private pages anyway.
> 
> One benefit of retaining this order (restore, invalidate) is that it
> opens the invalidate hook to possibly do something regarding memory
> contents?
> 
> Or perhaps we should just take the suggestion (invalidate, restore) and
> align that invalidate should not touch memory contents.
> 
>> @@ -591,6 +638,9 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
>>        /* Unmovable mappings are supposed to be marked unevictable as well. */
>>        WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
>>
>> +     if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP)
>> +             mapping_set_no_direct_map(inode->i_mapping);
>> +
>>        GMEM_I(inode)->flags = flags;
>>
>>        file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
>> @@ -803,13 +853,22 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>>        }
>>
>>        r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>> +     if (r)
>> +             goto out_unlock;
>>
>> +     if (kvm_gmem_no_direct_map(folio_inode(folio))) {
>> +             r = kvm_gmem_folio_zap_direct_map(folio);
>> +             if (r)
>> +                     goto out_unlock;
>> +     }
>> +
>>
>> [...snip...]
>>
> 
> Preparing a folio used to involve zeroing, but that has since been
> refactored out, so I believe zapping can come before preparing.
> 
> Similar to the above point on invalidation: perhaps we should take the
> suggestion to zap then prepare
> 
> + And align that preparation should not touch memory contents
> + Avoid needing to undo the preparation on zapping failure (.free_folio
>    is not called on folio_put(), it is only called folio on removal from
>    filemap).

I reordered both, thanks.



More information about the linux-riscv mailing list