[PATCH V2 2/3] nvmet-tcp: fix a crash in nvmet_req_complete()

Fri Dec 29 00:51:42 PST 2023

po 25. 12. 2023 v 11:28 odesílatel Sagi Grimberg <sagi at grimberg.me> napsal:
>
>
> > in nvmet_tcp_handle_h2c_data_pdu(), if the host sends a data_offset
> > different from rbytes_done, the driver ends up calling nvmet_req_complete()
> > passing a status error.
> > The problem is that at this point cmd->req is not yet initialized,
> > the kernel will crash after dereferencing a NULL pointer.
> >
> > Fix the bug by replacing the call to nvmet_req_complete() with
> > nvmet_tcp_fatal_error().
>
> This is indeed a bug. However nvmet attempts to gracefully fail
> a particular nvme command when there is a recoverable error (see
> nvmet_tcp_handle_req_failure).
>
> In the case where the length is arbitrarily long, then it really
> doesn't make sense for nvmet to just accept it and throw it away,
> but if this is an offset error, maybe it is...
>
> I'm not hard set on this, but it would be beneficial to have
> allow some graceful cmd failure here...

But doesn't the nvme over fabric specification explicitly say that
those offset and size errors in H2C PDUs should
be treated as fatal transport errors (See
NVMe-over-Fabrics-1.1a-2021.07.12, page 69) ?
What is really missing here is setting the Fatal Error Status field in
the C2HTermReq PDU, that's
why I left the FIXME tag in the comment.

> Perhaps you can share
> more about what was causing this error?
>

Nothing actually, but someone could write a simple program that
sends invalid packets (syzkaller, for example) and easily make the
target's kernel crash.

Maurizio