[PATCH v3] nvme: fix memory corruption for passthrough metadata

Kanchan Joshi joshiiitr at gmail.com
Thu Oct 12 19:19:19 PDT 2023


On Thu, Oct 12, 2023 at 9:01 PM Keith Busch <kbusch at kernel.org> wrote:
>
> On Thu, Oct 12, 2023 at 06:36:52AM +0200, Christoph Hellwig wrote:
> > On Wed, Oct 11, 2023 at 11:04:58AM -0600, Keith Busch wrote:
> >
> > > I don't think it's reasonable for the driver to decode every passthrough
> > > command to validate the data lengths, or reject ones that we don't know
> > > how to decode. SG_IO doesn't do that either.
> >
> > I don't want that either, but what can we do against a (possibly
> > unprivileged) user corrupting data?
>
> The unpriviledged access is kind of recent. Maybe limit the scope of
> decoding to that usage?

I can send an iteration today that takes this route.
Maybe that can be considered over dropping a useful feature.

> We've always known the interface can be misused to corrupt memory and/or
> data, and it was always user responsibility to use this interface
> reponsibly. We shouldn't disable something people have relied on for
> over 10 years just because someone rediscovered ways to break it.
>
> It's not like this is a "metadata" specific thing either; you can
> provide short user space buffers and corrupt memory with regular admin
> commands, and we have been able to that from day 1. But if you abuse
> this interface, it was always your fault; the kernel never took
> responsibility to sanity check your nvme input, and I think it's a bad
> precedent to start doing it.

In my mind, this was about dealing with the specific case when the
kernel memory is being used for device DMA.
We have just two cases: (i) separate meta buffer, and (ii) bounce
buffer for data (+metadata).
I had not planned sanity checks for user inputs for anything beyond that.
As opposed to being preventive (in all cases), it was about failing
only when we are certain that DMA will take place and it will corrupt
kernel memory.

In the long-term, it may be possible for the path to do away with
memory copies. The checks can disappear with that.



More information about the Linux-nvme mailing list