[RFC PATCH v3 0/6] Direct Map Removal for guest_memfd

David Hildenbrand david at redhat.com
Mon Nov 4 04:18:51 PST 2024


On 31.10.24 11:42, Patrick Roy wrote:
> On Thu, 2024-10-31 at 09:50 +0000, David Hildenbrand wrote:
>> On 30.10.24 14:49, Patrick Roy wrote:
>>> Unmapping virtual machine guest memory from the host kernel's direct map
>>> is a successful mitigation against Spectre-style transient execution
>>> issues: If the kernel page tables do not contain entries pointing to
>>> guest memory, then any attempted speculative read through the direct map
>>> will necessarily be blocked by the MMU before any observable
>>> microarchitectural side-effects happen. This means that Spectre-gadgets
>>> and similar cannot be used to target virtual machine memory. Roughly 60%
>>> of speculative execution issues fall into this category [1, Table 1].
>>>
>>> This patch series extends guest_memfd with the ability to remove its
>>> memory from the host kernel's direct map, to be able to attain the above
>>> protection for KVM guests running inside guest_memfd.
>>>
>>> === Changes to v2 ===
>>>
>>> - Handle direct map removal for physically contiguous pages in arch code
>>>     (Mike R.)
>>> - Track the direct map state in guest_memfd itself instead of at the
>>>     folio level, to prepare for huge pages support (Sean C.)
>>> - Allow configuring direct map state of not-yet faulted in memory
>>>     (Vishal A.)
>>> - Pay attention to alignment in ftrace structs (Steven R.)
>>>
>>> Most significantly, I've reduced the patch series to focus only on
>>> direct map removal for guest_memfd for now, leaving the whole "how to do
>>> non-CoCo VMs in guest_memfd" for later. If this separation is
>>> acceptable, then I think I can drop the RFC tag in the next revision
>>> (I've mainly kept it here because I'm not entirely sure what to do with
>>> patches 3 and 4).
>>
>> Hi,
>>
>> keeping upcoming "shared and private memory in guest_memfd" in mind, I
>> assume the focus would be to only remove the direct map for private memory?
>>
>> So in the current upstream state, you would only be removing the direct
>> map for private memory, currently translating to "encrypted"/"protected"
>> memory that is inaccessible either way already.
>>
>> Correct?
> 
> Yea, with the upcomming "shared and private" stuff, I would expect the
> the shared<->private conversions would call the routines from patch 3 to
> restore direct map entries on private->shared, and zap them on
> shared->private.

I wanted to follow-up to the discussion we had in the bi-weekly call.

We talked about shared (faultable) vs. private (unfaultable), and how it 
would interact with the directmap patches here.

As discussed, having private (unfaultable) memory with the direct-map 
removed and shared (faultable) memory with the direct-mapping can make 
sense for non-TDX/AMD-SEV/... non-CoCo use cases. Not sure about CoCo, 
the discussion here seems to indicate that it might currently not be 
required.

So one thing we could do is that shared (faultable) will have a direct 
mapping and be gup-able and private (unfaultable) memory will not have a 
direct mapping and is, by design, not gup-able.

Maybe it could make sense to not have a direct map for all guest_memfd 
memory, making it behave like secretmem (and it would be easy to 
implement)? But I'm not sure if that is really desirable in VM context.

Having a mixture of "has directmap" and "has no directmap" for shared 
(faultable) memory should not be done. Similarly, private memory really 
should stay "unfaultable".

I think one of the points raised during the bi-weekly call was that 
using a viommu/swiotlb might be the right call, such that all memory can 
be considered private (unfaultable) that is not explicitly 
shared/expected to be modified by the hypervisor (-> faultable, -> 
GUP-able).

Further, I think Sean had some good points why we should explore that 
direction, but I recall that there were some issue to be sorted out 
(interpreted instructions requiring direct map when accessing "private" 
memory?), not sure if that is already working/can be made working in KVM.

What's your opinion after the call and the next step for use cases like 
you have in mind (IIRC firecracker, which wants to not have the 
direct-map for guest memory where it can be avoided)?

-- 
Cheers,

David / dhildenb




More information about the linux-riscv mailing list