[PATCH v2 1/2] nvme: fix memory corruption for passthrough metadata
Kanchan Joshi
joshi.k at samsung.com
Mon Sep 4 22:18:25 PDT 2023
On Fri, Sep 01, 2023 at 10:45:50AM -0400, Keith Busch wrote:
>On Mon, Aug 14, 2023 at 12:32:12PM +0530, Kanchan Joshi wrote:
>> +static bool nvme_validate_passthru_meta(struct nvme_ctrl *ctrl,
>> + struct nvme_ns *ns,
>> + struct nvme_command *c,
>> + __u64 meta, __u32 meta_len)
>> +{
>> + /*
>> + * User may specify smaller meta-buffer with a larger data-buffer.
>> + * Driver allocated meta buffer will also be small.
>> + * Device can do larger dma into that, overwriting unrelated kernel
>> + * memory.
>> + */
>
>What if the user doesn't specify metadata or length for a command that
>uses it? The driver won't set MPTR in that case, causing the device to
>access NULL.
That is fine, because kernel is not going to allocate memory for that
case. I am not trying to provide any saftey net to user-space in this
patch. Rather, the objective is to prevent kernel-memory corruption.
>And similiar to this problem, what if the metadata is extended rather
>than separate, and the user's buffer is too short? That will lead to the
>same type of problem you're trying to fix here?
No.
For extended metadata, userspace is using its own buffer. Since
intermediate kernel buffer does not exist, I do not have a problem to
solve.
>My main concern, though, is forward and backward compatibility. Even
>when metadata is enabled, there are IO commands that don't touch it, so
>some tool that erroneously requested it will stop working. Or perhaps
>some other future opcode will have some other metadata use that doesn't
>match up exactly with how read/write/compare/append use it. As much as
>I'd like to avoid bad user commands from crashing, these kinds of checks
>can become problematic for maintenance.
For forward compatibility - if we have commands that need to specify
metadata in a different way (than what is possible from this interface),
we anyway need a new passthrough command structure.
Moreover, it's really about caring _only_ for cases when kernel allocates
memory for metadata. And those cases are specific (i.e., when
metadata and metalen are not zero). We don't have to think in terms of
opcode (existing or future), no?
For backward comptability, should we care about erroneous tools.
My concern is we currenly have a hole that can be exploited to affect
other sane applications and bring the kernel down to its knees.
More information about the Linux-nvme
mailing list