[PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas
mhocko at suse.com
Tue Feb 9 03:47:08 EST 2021
On Mon 08-02-21 23:26:05, Mike Rapoport wrote:
> On Mon, Feb 08, 2021 at 11:49:22AM +0100, Michal Hocko wrote:
> > On Mon 08-02-21 10:49:17, Mike Rapoport wrote:
> > > The file descriptor based memory has several advantages over the
> > > "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). It
> > > paves the way for VMMs to remove the secret memory range from the process;
> > I do not understand how it helps to remove the memory from the process
> > as the interface explicitly allows to add a memory that is removed from
> > all other processes via direct map.
> The current implementation does not help to remove the memory from the
> process, but using fd-backed memory seems a better interface to remove
> guest memory from host mappings than mmap. As Andy nicely put it:
> "Getting fd-backed memory into a guest will take some possibly major work in
> the kernel, but getting vma-backed memory into a guest without mapping it
> in the host user address space seems much, much worse."
OK, so IIUC this means that the model is to hand over memory from host
to guest. I thought the guest would be under control of its address
space and therefore it operates on the VMAs. This would benefit from
an additional and more specific clarification.
> > > As secret memory implementation is not an extension of tmpfs or hugetlbfs,
> > > usage of a dedicated system call rather than hooking new functionality into
> > > memfd_create(2) emphasises that memfd_secret(2) has different semantics and
> > > allows better upwards compatibility.
> > What is this supposed to mean? What are differences?
> Well, the phrasing could be better indeed. That supposed to mean that
> they differ in the semantics behind the file descriptor: memfd_create
> implements sealing for shmem and hugetlbfs while memfd_secret implements
> memory hidden from the kernel.
Right but why memfd_create model is not sufficient for the usecase?
Please note that I am arguing against. To be honest I do not really care
much. Using an existing scheme is usually preferable from my POV but
there might be real reasons why shmem as a backing "storage" is not
> > > The secretmem mappings are locked in memory so they cannot exceed
> > > RLIMIT_MEMLOCK. Since these mappings are already locked an attempt to
> > > mlock() secretmem range would fail and mlockall() will ignore secretmem
> > > mappings.
> > What about munlock?
> Isn't this implied? ;-)
My bad here. I thought that munlock fails on vmas which are not mlocked
and I was curious about the behavior when mlockall() is followed by
munlock. But I do not see this being the case. So this should be ok.
More information about the linux-riscv