[PATCH v2 1/2] nvme: fix memory corruption for passthrough metadata

Keith Busch kbusch at kernel.org
Tue Sep 5 11:08:40 PDT 2023


On Tue, Sep 05, 2023 at 10:48:25AM +0530, Kanchan Joshi wrote:
> On Fri, Sep 01, 2023 at 10:45:50AM -0400, Keith Busch wrote:
> > And similiar to this problem, what if the metadata is extended rather
> > than separate, and the user's buffer is too short? That will lead to the
> > same type of problem you're trying to fix here?
> 
> No.
> For extended metadata, userspace is using its own buffer. Since
> intermediate kernel buffer does not exist, I do not have a problem to
> solve.

We still use kernel memory if the user buffer is unaligned. If the user
space provides an short unaligned buffer, the device will corrupt kernel
memory.
 
> > My main concern, though, is forward and backward compatibility. Even
> > when metadata is enabled, there are IO commands that don't touch it, so
> > some tool that erroneously requested it will stop working. Or perhaps
> > some other future opcode will have some other metadata use that doesn't
> > match up exactly with how read/write/compare/append use it. As much as
> > I'd like to avoid bad user commands from crashing, these kinds of checks
> > can become problematic for maintenance.
> 
> For forward compatibility - if we have commands that need to specify
> metadata in a different way (than what is possible from this interface),
> we anyway need a new passthrough command structure.

Not sure about that. The existing struct is flexible enough to describe
any possible nvme command.

More specifically about compatibility is that this patch assumes an
"nlb" field exists inside an opaque structure at DW12 offset, and that
field defines how large the metadata buffer needs to be. Some vendor
specific or future opcode may have DW12 mean something completely
different, but still need to access metadata this patch may prevent from
working.

> Moreover, it's really about caring _only_ for cases when kernel
> allocates
> memory for metadata. And those cases are specific (i.e., when
> metadata and metalen are not zero). We don't have to think in terms of
> opcode (existing or future), no?

It looks like a little work, but I don't see why blk-integrity must use
kernel memory. Introducing an API like 'bio_integrity_map_user()' might
also address your concern, as long as the user buffer is aligned. It
sounds like we're assuming user buffers are aligned, at least.



More information about the Linux-nvme mailing list