[PATCH v17 07/10] mm: introduce memfd_secret system call to create "secret" memory areas

Thu Feb 11 06:20:08 EST 2021

On Thu, Feb 11, 2021 at 09:39:38AM +0100, Michal Hocko wrote:
> On Thu 11-02-21 09:13:19, Mike Rapoport wrote:
> > On Tue, Feb 09, 2021 at 02:17:11PM +0100, Michal Hocko wrote:
> > > On Tue 09-02-21 11:09:38, Mike Rapoport wrote:
> [...]
> > > > Citing my older email:
> > > > 
> > > >     I've hesitated whether to continue to use new flags to memfd_create() or to
> > > >     add a new system call and I've decided to use a new system call after I've
> > > >     started to look into man pages update. There would have been two completely
> > > >     independent descriptions and I think it would have been very confusing.
> > > 
> > > Could you elaborate? Unmapping from the kernel address space can work
> > > both for sealed or hugetlb memfds, no? Those features are completely
> > > orthogonal AFAICS. With a dedicated syscall you will need to introduce
> > > this functionality on top if that is required. Have you considered that?
> > > I mean hugetlb pages are used to back guest memory very often. Is this
> > > something that will be a secret memory usecase?
> > > 
> > > Please be really specific when giving arguments to back a new syscall
> > > decision.
> > 
> > Isn't "syscalls have completely independent description" specific enough?
> 
> No, it's not as you can see from questions I've had above. More on that
> below.
> 
> > We are talking about API here, not the implementation details whether
> > secretmem supports large pages or not.
> > 
> > The purpose of memfd_create() is to create a file-like access to memory.
> > The purpose of memfd_secret() is to create a way to access memory hidden
> > from the kernel.
> > 
> > I don't think overloading memfd_create() with the secretmem flags because
> > they happen to return a file descriptor will be better for users, but
> > rather will be more confusing.
> 
> This is quite a subjective conclusion. I could very well argue that it
> would be much better to have a single syscall to get a fd backed memory
> with spedific requirements (sealing, unmapping from the kernel address
> space). 

> Neither of us would be clearly right or wrong.

100% agree :)

> A more important point is a future extensibility and usability, though.
> So let's just think of few usecases I have outlined above. Is it
> unrealistic to expect that secret memory should be sealable? What about
> hugetlb? Because if the answer is no then a new API is a clear win as the
> combination of flags would never work and then we would just suffer from
> the syscall multiplexing without much gain. On the other hand if
> combination of the functionality is to be expected then you will have to
> jam it into memfd_create and copy the interface likely causing more
> confusion. See what I mean?

I see your point, but I think that overloading memfd_create definitely gets
us into syscall multiplexing from day one and support for seals and huge
pages in the secretmem will not make it less of a multiplexer.

Sealing is anyway controlled via fcntl() and I don't think
MFD_ALLOW_SEALING makes much sense for the secretmem because it is there to
prevent rogue file sealing in tmpfs/hugetlbfs.

As for the huge pages, I'm not sure at all that supporting huge pages in
secretmem will involve hugetlbfs. And even if yes, adding SECRETMEM_HUGE
flag seems to me less confusing than saying "from kernel x.y you can use
MFD_CREATE | MFD_SECRET | MFD_HUGE" etc for all possible combinations.

> I by no means do not insist one way or the other but from what I have
> seen so far I have a feeling that the interface hasn't been thought
> through enough.

It has been, but we have different thoughts about it ;-)

-- 
Sincerely yours,
Mike.