[PATCH v10 09/15] KVM: guest_memfd: Add flag to remove from direct map

Nikita Kalyazin kalyazin at amazon.com
Fri Mar 6 06:49:18 PST 2026



On 06/03/2026 14:22, David Hildenbrand (Arm) wrote:
> [...]
> 
>>>> +     /*
>>>> +      * Direct map restoration cannot fail, as the only error condition
>>>> +      * for direct map manipulation is failure to allocate page tables
>>>> +      * when splitting huge pages, but this split would have already
>>>> +      * happened in folio_zap_direct_map() in
>>>> kvm_gmem_folio_zap_direct_map().
>>>> +      * Note that the splitting occurs always because guest_memfd
>>>> +      * currently supports only base pages.
>>>> +      * Thus folio_restore_direct_map() here only updates prot bits.
>>>> +      */
>>>> +     WARN_ON_ONCE(folio_restore_direct_map(folio));
>>>
>>> Which raised the question: why should this function then even return an
>>> error?
>>
>> Dave pointed earlier that the failures were possible [1].  Do you think
>> we can document it better?
> 
> I'm fine with checking that somewhere (to catch any future problems).
> 
> Why not do the WARN_ON_ONCE() in folio_restore_direct_map()?
> 
> Then, carefully document (in the new kerneldoc for
> folio_restore_direct_map() etc) that folio_restore_direct_map() is only
> allowed after a prior successful folio_zap_direct_map(), and add a
> helpful comment above the WARN_ON_ONCE() in folio_restore_direct_map()
> that we don't expect errors etc.

My only concern about that is the assumptions we make in KVM may not 
apply to the general case and the WARN_ON_ONCE may become too 
restrictive compared to proper error handling in some (rare) cases.  For 
example, is it possible for the folio to migrate in between?

> 
> [...]
> 
>>>> -     if (!is_prepared)
>>>> +     if (!is_prepared) {
>>>>                 r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
>>>> +             if (r)
>>>> +                     goto out_unlock;
>>>> +     }
>>>> +
>>>> +     if (kvm_gmem_no_direct_map(folio_inode(folio))) {
>>>> +             r = kvm_gmem_folio_zap_direct_map(folio);
>>>> +             if (r)
>>>> +                     goto out_unlock;
>>>> +     }
>>>
>>>
>>> It's a bit nasty that we have two different places where we have to call
>>> this. Smells error prone.
>>
>> We will actually have 2 more: for the write() syscall and UFFDIO_COPY,
>> and 0 once we have [2]
>>
>> [2] https://lore.kernel.org/linux-mm/20260225-page_alloc-unmapped-v1-0-
>> e8808a03cd66 at google.com/
>>
>>>
>>> I was wondering why kvm_gmem_get_folio() cannot handle that?
>>
>> Most of the call sites follow the pattern alloc -> write -> zap so
>> they'll need direct map for some time after the allocation.
>>
> 
> Okay. Nasty. :)
> 
> --
> Cheers,
> 
> David




More information about the linux-riscv mailing list