[PATCH 3/6] mm: introduce secretmemfd system call to create "secret" memory areas

Mon Jul 20 15:16:25 EDT 2020

On Mon, 2020-07-20 at 20:08 +0200, Arnd Bergmann wrote:
> On Mon, Jul 20, 2020 at 5:52 PM James Bottomley <jejb at linux.ibm.com>
> wrote:
> > On Mon, 2020-07-20 at 13:30 +0200, Arnd Bergmann wrote:
> > 
> > I'll assume you mean the dmabuf userspace API?  Because the kernel
> > API is completely device exchange specific and wholly inappropriate
> > for this use case.
> > 
> > The user space API of dmabuf uses a pseudo-filesystem.  So you
> > mount the dmabuf file type (and by "you" I mean root because an
> > ordinary user doesn't have sufficient privilege).  This is
> > basically because every dmabuf is usable by any user who has
> > permissions.  This really isn't the initial interface we want for
> > secret memory because secret regions are supposed to be per process
> > and not shared (at least we don't want other tenants to see who's
> > using what).
> > 
> > Once you have the fd, you can seek to find the size, mmap, poll and
> > ioctl it.  The ioctls are all to do with memory synchronization (as
> > you'd expect from a device backed region) and the mmap is handled
> > by the dma_buf_ops, which is device specific.  Sizing is missing
> > because that's reported by the device not settable by the user.
> 
> I was mainly talking about the in-kernel interface that is used for
> sharing a buffer with hardware. Aside from the limited ioctls,
> anything in the kernel can decide on how it wants to export a dma_buf
> by calling dma_buf_export()/dma_buf_fd(), which is roughly what the
> new syscall does as well. Using dma_buf vs the proposed
> implementation for this is not a big difference in complexity.

I have thought about it, but haven't got much further:  We can't couple
to SGX without a huge break in the current simple userspace API (it
becomes complex because you'd have to enter the enclave each time you
want to use the memory, or put the whole process in the enclave, which
is a bit of a nightmare for simplicity), and we could only couple it to
SEV if the memory encryption engine would respond to PCID as well as
ASID, which it doesn't.

> The one thing that a dma_buf does is that it allows devices to
> do DMA on it. This is either something that can turn out to be
> useful later, or it is not. From the description, it sounded like
> the sharing might be useful, since we already have known use
> cases in which "secret" data is exchanged with a trusted execution
> environment using the dma-buf interface.

The current use case for private keys is that you take an encrypted
file (which would be the DMA coupled part) and you decrypt the contents
into the secret memory.  There might possibly be a DMA component later
where a HSM like device DMAs a key directly into your secret memory to
avoid exposure, but I wouldn't anticipate any need for anything beyond
the usual page cache API for that case (effectively this would behave
like an ordinary page cache page except that only the current process
would be able to touch the contents).

> If there is no way the data stored in this new secret memory area
> would relate to secret data in a TEE or some other hardware
> device, then I agree that dma-buf has no value.

Never say never, but current TEE designs tend to require full
confidentiality for the entire execution.  What we're probing is
whether we can improve security by doing an API that requires less than
full confidentiality for the process.  I think if the API proves useful
then we will get HW support for it, but it likely won't be in the
current TEE of today form.

> > What we want is the ability to get an fd, set the properties and
> > the size and mmap it.  This is pretty much a 100% overlap with the
> > memfd API and not much overlap with the dmabuf one, which is why I
> > don't think the interface is very well suited.
> 
> Does that mean you are suggesting to use additional flags on
> memfd_create() instead of a new system call?

Well, that was what the previous patch did.  I'm agnostic on the
mechanism for obtaining the fd: new syscall as this patch does or
extension to memfd like the old one did.  All I was saying is that once
you have the fd, the API you use on it is the same as the memfd API.

James