[RFC PATCH v3 0/6] Direct Map Removal for guest_memfd

David Hildenbrand david at redhat.com
Fri Nov 15 09:10:33 PST 2024


On 15.11.24 17:59, Patrick Roy wrote:
> 
> 
> On Tue, 2024-11-12 at 14:52 +0000, David Hildenbrand wrote:
>> On 12.11.24 15:40, Patrick Roy wrote:
>>> I remember talking to someone at some point about whether we could reuse
>>> the proc-local stuff for guest memory, but I cannot remember the outcome
>>> of that discussion... (or maybe I just wanted to have a discussion about
>>> it, but forgot to follow up on that thought?).  I guess we wouldn't use
>>> proc-local _allocations_, but rather just set up proc-local mappings of
>>> the gmem allocations that have been removed from the direct map.
>>
>> Yes. And likely only for memory we really access / try access, if possible.
> 
> Well, if we start on-demand mm-local mapping the things we want to
> access, we're back in TLB flush hell, no?

At least the on-demand mapping shouldn't require a TLB flush? Only 
"unmapping" if we want to restrict the size of a "mapped pool" of 
restricted size.

Anyhow, this would be a pure optimization, to avoid the expense of 
mapping everything, when in practice you'd like not access most of it.

(my theory, happy to be told I'm wrong :) )

> And we can't know
> ahead-of-time what needs to be mapped, so everything would need to be
> mapped (unless we do something like mm-local mapping a page on first
> access, and then just never unmapping it again, under the assumption
> that establishing the mapping won't be expensive)

Right, the whole problem is that we don't know that upfront.

> 
>>>
>>> I'm wondering, where exactly would be the differences to Sean's idea
>>> about messing with the CR3 register inside KVM to temporarily install
>>> page tables that contain all the gmem stuff, conceptually? Wouldn't we
>>> run into the same interrupt problems that Sean foresaw for the CR3
>>> stuff? (which, admittedly, I still don't quite follow what these are :(
>>> ).
>>
>> I'd need some more details on that. If anything would rely on the direct
>> mapping (from IRQ context?) than ... we obviously cannot remove the
>> direct mapping :)
> 
> I've talked to Fares internally, and it seems that generally doing
> mm-local mappings of guest memory would work for us. We also figured out
> what the "interrupt problem" is, namely that if we receive an interrupt
> while executing in a context that has mm-local mappings available, those
> mappings will continue to be available while the interrupt is being
> handled.

Isn't that likely also the case with secretmem where we removed the 
directmap, but have an effective per-mm mapping in the (user-space 
portion) of the page table?

> I'm talking to my security folks to see how much of a concern
> this is for the speculation hardening we're trying to achieve. Will keep
> you in the loop there :)

Thanks!

-- 
Cheers,

David / dhildenb




More information about the linux-riscv mailing list