[PATCH v6 3/6] mm: introduce memfd_secret system call to create "secret" memory areas

Edgecombe, Rick P rick.p.edgecombe at intel.com
Tue Sep 29 16:06:03 EDT 2020


On Tue, 2020-09-29 at 16:06 +0300, Mike Rapoport wrote:
> On Tue, Sep 29, 2020 at 04:58:44AM +0000, Edgecombe, Rick P wrote:
> > On Thu, 2020-09-24 at 16:29 +0300, Mike Rapoport wrote:
> > > Introduce "memfd_secret" system call with the ability to create
> > > memory
> > > areas visible only in the context of the owning process and not
> > > mapped not
> > > only to other processes but in the kernel page tables as well.
> > > 
> > > The user will create a file descriptor using the memfd_secret()
> > > system call
> > > where flags supplied as a parameter to this system call will
> > > define
> > > the
> > > desired protection mode for the memory associated with that file
> > > descriptor.
> > > 
> > >   Currently there are two protection modes:
> > > 
> > > * exclusive - the memory area is unmapped from the kernel direct
> > > map
> > > and it
> > >                is present only in the page tables of the owning
> > > mm.
> > 
> > Seems like there were some concerns raised around direct map
> > efficiency, but in case you are going to rework this...how does
> > this
> > memory work for the existing kernel functionality that does things
> > like
> > this?
> > 
> > get_user_pages(, &page);
> > ptr = kmap(page);
> > foo = *ptr;
> > 
> > Not sure if I'm missing something, but I think apps could cause the
> > kernel to access a not-present page and oops.
> 
> The idea is that this memory should not be accessible by the kernel,
> so
> the sequence you describe should indeed fail.
> 
> Probably oops would be to noisy and in this case the report needs to
> be
> less verbose.

I was more concerned that it could cause kernel instabilities.

I see, so it should not be accessed even at the userspace address? I
wonder if it should be prevented somehow then. At least
get_user_pages() should be prevented I think. Blocking copy_*_user()
access might not be simple.

I'm also not so sure that a user would never have any possible reason
to copy data from this memory into the kernel, even if it's just
convenience. In which case a user setup could break if a specific
kernel implementation switched to get_user_pages()/kmap() from using
copy_*_user(). So seems maybe a bit thorny without fully blocking
access from the kernel, or deprecating that pattern.

You should probably call out these "no passing data to/from the kernel"
expectations, unless I missed them somewhere.


More information about the linux-riscv mailing list