[PATCH v6 3/6] mm: introduce memfd_secret system call to create "secret" memory areas
Edgecombe, Rick P
rick.p.edgecombe at intel.com
Tue Sep 29 16:06:03 EDT 2020
On Tue, 2020-09-29 at 16:06 +0300, Mike Rapoport wrote:
> On Tue, Sep 29, 2020 at 04:58:44AM +0000, Edgecombe, Rick P wrote:
> > On Thu, 2020-09-24 at 16:29 +0300, Mike Rapoport wrote:
> > > Introduce "memfd_secret" system call with the ability to create
> > > memory
> > > areas visible only in the context of the owning process and not
> > > mapped not
> > > only to other processes but in the kernel page tables as well.
> > >
> > > The user will create a file descriptor using the memfd_secret()
> > > system call
> > > where flags supplied as a parameter to this system call will
> > > define
> > > the
> > > desired protection mode for the memory associated with that file
> > > descriptor.
> > >
> > > Currently there are two protection modes:
> > >
> > > * exclusive - the memory area is unmapped from the kernel direct
> > > map
> > > and it
> > > is present only in the page tables of the owning
> > > mm.
> >
> > Seems like there were some concerns raised around direct map
> > efficiency, but in case you are going to rework this...how does
> > this
> > memory work for the existing kernel functionality that does things
> > like
> > this?
> >
> > get_user_pages(, &page);
> > ptr = kmap(page);
> > foo = *ptr;
> >
> > Not sure if I'm missing something, but I think apps could cause the
> > kernel to access a not-present page and oops.
>
> The idea is that this memory should not be accessible by the kernel,
> so
> the sequence you describe should indeed fail.
>
> Probably oops would be to noisy and in this case the report needs to
> be
> less verbose.
I was more concerned that it could cause kernel instabilities.
I see, so it should not be accessed even at the userspace address? I
wonder if it should be prevented somehow then. At least
get_user_pages() should be prevented I think. Blocking copy_*_user()
access might not be simple.
I'm also not so sure that a user would never have any possible reason
to copy data from this memory into the kernel, even if it's just
convenience. In which case a user setup could break if a specific
kernel implementation switched to get_user_pages()/kmap() from using
copy_*_user(). So seems maybe a bit thorny without fully blocking
access from the kernel, or deprecating that pattern.
You should probably call out these "no passing data to/from the kernel"
expectations, unless I missed them somewhere.
More information about the linux-riscv
mailing list