[PATCH v2 1/2] nvme: fix memory corruption for passthrough metadata

Kanchan Joshi joshiiitr at gmail.com
Thu Sep 7 13:35:20 PDT 2023


On Thu, Sep 7, 2023 at 9:11 PM Keith Busch <kbusch at kernel.org> wrote:
>
> On Wed, Sep 06, 2023 at 09:18:15PM +0530, Kanchan Joshi wrote:
> > Would you really prefer to have nvme_add_user_metadata() changed to do
> > away with allocation and use userspace meta-buffer directly?
>
> I mean, sure, if it's possible. We can avoid a costly copy if the user
> metabuffer is aligned and physically contiguous.

Seems possible, but that does not really solve the actual problem
(which is not performance) this patch is for.
It will require replicating big code of blk_rq_map_user_iov() for
integrity metadata and map the pages into bip.
But since the user-space meta buffer can be unaligned (and bunch of
other conditions present there), it has to make the meta copy in
kernel-space.
And we will be back to where we started - how to avoid corruption into
kernel memory.

Same situation for the case when we are dealing with
extended-lba-format and interleaved user-buffer is unaligned. .

Handling both these anyway requires adding the kind of code/checks
mentioned in the previous patch.
Do you see another way?

While I agree there is value in avoiding the meta copy in general and
I can look into it, but that should be a separate effort (with focus
on performance).

> > Even with that route, extended-lba-with-short-unaligned-buffer remains
> > unhandled. That will still require similar checks that I would like
> > to avoid but cannnot.
> >
> > So how about this -
>
> There's lots of bad things you can do with this interface. Example,
> provide an unaligned single byte user buffer and send an Identify
> command.

Not sure I follow. Do you mean the patch does not handle these cases?

> We never provided opcode decoding sanity checks before because it's a
> bad maintenance burden, adds performance killing overhead, couldn't
> catch all the cases anyway due to vendor specific and future opcodes,
> and harms the flexibility of the interface.

Given the way things are (integrity schemes, cdw12 etc.), I do not see
a way to avoid opcode checks.
Flexibility is not getting reduced in the previous patch. All the
other commands (beyond read/write/compare/append) remain untouched.
And metadata-io is not the fast path at the moment (given memory
allocation, bunch of extra things by blk-integrity, bunch of extra
things done by device etc.).

>The burden is usually on the
> user for these kinds of priviledged interfaces: if you abuse it, "you
> get to keep both pieces" territory.

Not sure I got that. Have you seen the crash mentioned in this cover
letter: https://lore.kernel.org/linux-nvme/20230811155906.15883-1-joshi.k@samsung.com/
A simple unprivileged read by a rogue application can wreck other
applications/system at the moment.
Is it fine to keep the status quo?



More information about the Linux-nvme mailing list