[PATCH v9 07/13] KVM: guest_memfd: Add flag to remove from direct map
Nikita Kalyazin
kalyazin at amazon.com
Thu Jan 22 10:47:41 PST 2026
On 22/01/2026 18:37, Ackerley Tng wrote:
> Nikita Kalyazin <kalyazin at amazon.com> writes:
>
>> On 16/01/2026 00:00, Edgecombe, Rick P wrote:
>>> On Wed, 2026-01-14 at 13:46 +0000, Kalyazin, Nikita wrote:
>>>> +static void kvm_gmem_folio_restore_direct_map(struct folio *folio)
>>>> +{
>>>> + /*
>>>> + * Direct map restoration cannot fail, as the only error condition
>>>> + * for direct map manipulation is failure to allocate page tables
>>>> + * when splitting huge pages, but this split would have already
>>>> + * happened in folio_zap_direct_map() in kvm_gmem_folio_zap_direct_map().
>
> Do you know if folio_restore_direct_map() will also end up merging page
> table entries to a higher level?
By looking at the callchain in x86 at least, I can't see how it would.
>
>>>> + * Thus folio_restore_direct_map() here only updates prot bits.
>>>> + */
>>>> + if (kvm_gmem_folio_no_direct_map(folio)) {
>>>> + WARN_ON_ONCE(folio_restore_direct_map(folio));
>>>> + folio->private = (void *)((u64)folio->private & ~KVM_GMEM_FOLIO_NO_DIRECT_MAP);
>>>> + }
>>>> +}
>>>> +
>>>
>>> Does this assume the folio would not have been split after it was zapped? As in,
>>> if it was zapped at 2MB granularity (no 4KB direct map split required) but then
>>> restored at 4KB (split required)? Or it gets merged somehow before this?
>
> I agree with the rest of the discussion that this will probably land
> before huge page support, so I will have to figure out the intersection
> of the two later.
>
>>
>> AFAIK it can't be zapped at 2MB granularity as the zapping code will
>> inevitably cause splitting because guest_memfd faults occur at the base
>> page granularity as of now.
>
> Here's what I'm thinking for now:
>
> [HugeTLB, no conversions]
> With initial HugeTLB support (no conversions), host userspace
> guest_memfd faults will be:
>
> + For guest_memfd with PUD-sized pages
> + At PUD level or PTE level
> + For guest_memfd with PMD-sized pages
> + At PMD level or PTE level
>
> Since this guest_memfd doesn't support conversions, the folio is never
> split/merged, so the direct map is restored at whatever level it was
> zapped. I think this works out well.
>
> [HugeTLB + conversions]
> For a guest_memfd with HugeTLB support and conversions, host userspace
> guest_memfd faults will always be at PTE level, so the direct map will
> be split and the faulted pages have the direct map zapped in 4K chunks
> as they are faulted.
>
> On conversion back to private, put those back into the direct map
> (putting aside whether to merge the direct map PTEs for now).
>
>
> Unfortunately there's no unmapping callback for guest_memfd to use, so
> perhaps the principle should be to put the folios back into the direct
> map ASAP - at unmapping if guest_memfd is doing the unmapping, otherwise
> at freeing time?
More information about the linux-riscv
mailing list