[PATCH v10 02/15] set_memory: add folio_{zap, restore}_direct_map helpers
Nikita Kalyazin
kalyazin at amazon.com
Fri Mar 6 06:48:57 PST 2026
On 06/03/2026 14:17, David Hildenbrand (Arm) wrote:
> On 3/6/26 13:48, Nikita Kalyazin wrote:
>>
>>
>> On 05/03/2026 17:34, David Hildenbrand (Arm) wrote:
>>> On 1/26/26 17:47, Kalyazin, Nikita wrote:
>>>> From: Nikita Kalyazin <kalyazin at amazon.com>
>>>>
>>>> These allow guest_memfd to remove its memory from the direct map.
>>>> Only implement them for architectures that have direct map.
>>>> In folio_zap_direct_map(), flush TLB on architectures where
>>>> set_direct_map_valid_noflush() does not flush it internally.
>>>
>>> "Let's provide folio_{zap,restore}_direct_map helpers as preparation for
>>> supporting removal of the direct map for guest_memfd folios. ...
>>
>> Will update, thanks.
>>
>>>
>>>>
>>>> The new helpers need to be accessible to KVM on architectures that
>>>> support guest_memfd (x86 and arm64). Since arm64 does not support
>>>> building KVM as a module, only export them on x86.
>>>>
>>>> Direct map removal gives guest_memfd the same protection that
>>>> memfd_secret does, such as hardening against Spectre-like attacks
>>>> through in-kernel gadgets.
>>>
>>> Would it be possible to convert mm/secretmem.c as well?
>>>
>>> There, we use
>>>
>>> set_direct_map_invalid_noflush(folio_page(folio, 0));
>>>
>>> and
>>>
>>> set_direct_map_default_noflush(folio_page(folio, 0));
>>>
>>> Which is a bit different to below code. At least looking at the x86
>>> variants, I wonder why we don't simply use
>>> set_direct_map_valid_noflush().
>>>
>>>
>>> If so, can you add a patch to do the conversion, pleeeeassse ? :)
>>
>> Absolutely!
>>
>>>
>>>>
>>>> Reviewed-by: Ackerley Tng <ackerleytng at google.com>
>>>> Signed-off-by: Nikita Kalyazin <kalyazin at amazon.com>
>>>> ---
>>>> arch/arm64/include/asm/set_memory.h | 2 ++
>>>> arch/arm64/mm/pageattr.c | 12 ++++++++++++
>>>> arch/loongarch/include/asm/set_memory.h | 2 ++
>>>> arch/loongarch/mm/pageattr.c | 12 ++++++++++++
>>>> arch/riscv/include/asm/set_memory.h | 2 ++
>>>> arch/riscv/mm/pageattr.c | 12 ++++++++++++
>>>> arch/s390/include/asm/set_memory.h | 2 ++
>>>> arch/s390/mm/pageattr.c | 12 ++++++++++++
>>>> arch/x86/include/asm/set_memory.h | 2 ++
>>>> arch/x86/mm/pat/set_memory.c | 20 ++++++++++++++++++++
>>>> include/linux/set_memory.h | 10 ++++++++++
>>>> 11 files changed, 88 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/set_memory.h b/arch/arm64/
>>>> include/asm/set_memory.h
>>>> index c71a2a6812c4..49fd54f3c265 100644
>>>> --- a/arch/arm64/include/asm/set_memory.h
>>>> +++ b/arch/arm64/include/asm/set_memory.h
>>>> @@ -15,6 +15,8 @@ int set_direct_map_invalid_noflush(const void *addr);
>>>> int set_direct_map_default_noflush(const void *addr);
>>>> int set_direct_map_valid_noflush(const void *addr, unsigned long
>>>> numpages,
>>>> bool valid);
>>>> +int folio_zap_direct_map(struct folio *folio);
>>>> +int folio_restore_direct_map(struct folio *folio);
>>>> bool kernel_page_present(struct page *page);
>>>>
>>>> int set_memory_encrypted(unsigned long addr, int numpages);
>>>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
>>>> index e2bdc3c1f992..0b88b0344499 100644
>>>> --- a/arch/arm64/mm/pageattr.c
>>>> +++ b/arch/arm64/mm/pageattr.c
>>>> @@ -356,6 +356,18 @@ int set_direct_map_valid_noflush(const void
>>>> *addr, unsigned long numpages,
>>>> return set_memory_valid((unsigned long)addr, numpages, valid);
>>>> }
>>>>
>>>> +int folio_zap_direct_map(struct folio *folio)
>>>> +{
>>>> + return set_direct_map_valid_noflush(folio_address(folio),
>>>> + folio_nr_pages(folio), false);
>>>> +}
>>>> +
>>>> +int folio_restore_direct_map(struct folio *folio)
>>>> +{
>>>> + return set_direct_map_valid_noflush(folio_address(folio),
>>>> + folio_nr_pages(folio), true);
>>>> +}
>>>
>>> Is there a good reason why we cannot have two generic inline functions
>>> that simply call set_direct_map_valid_noflush() ?
>>>
>>> Is it because of some flushing behavior? (which we could figure out)
>>
>> Yes, on x86 we need an explicit flush. Other architectures deal with it
>> internally.
>
> So, we call a _noflush function and it performs a ... flush. What.
Yeah, that's unfortunately the status quo as pointed by Aneesh [1]
[1] https://lore.kernel.org/kvm/yq5ajz07czvz.fsf@kernel.org/
>
> Take a look at secretmem_fault(), where we do an unconditional
> flush_tlb_kernel_range().
>
> Do we end up double-flushing in that case?
Yes, looks like that. I'll remove the explicit flush and rely on
folio_zap_direct_map().
>
>> Do you propose a bespoke implementation for x86 and a
>> "generic" one for others?
>
> We have to find a way to have a single set of functions for all archs
> that support directmap removal.
I believe Dave meant to address that with
folio_{zap,restore}_direct_map() [2].
[2]
https://lore.kernel.org/kvm/9409531b-589b-4a54-b122-06a3cf0846f3@intel.com/
>
> One option might be to have some indication from the architecture that
> no flush_tlb_kernel_range() is required.
>
> Could be a config option or some simple helper function.
I'd be inclined to know what arch maintainers think because I don't have
a strong opinion on that.
>
> --
> Cheers,
>
> David
More information about the linux-riscv
mailing list