[PATCH] nvme-tcp: print correct opcode on timeout and error handling

Sun Mar 12 05:56:36 PDT 2023

2023年3月9日(木) 18:10 Sagi Grimberg <sagi at grimberg.me>:
>
>
> >> Sagi,
> >>
> >> On 3/8/23 03:06, Sagi Grimberg wrote:
> >>>
> >>>> In timeout and error handling, various information is reported about the
> >>>> command in error, but the printed opcode may not be correct
> >>>>
> >>>> The opcode is obtained from the nvme_command structure referenced by the
> >>>> 'cmd' member in the nvme_request structure.
> >>>> (i.e. nvme_req(req)->cmd->common.opcode)
> >>>>
> >>>> For the nvme-tcp driver, the 'cmd' member in the nvme_request structure
> >>>> points to a structure within the page fragment allocated by
> >>>> nvme_tcp_init_request().  This page fragment is used as a command
> >>>> capsule
> >>>> PDU, and then may be reused as a h2c data PDU, so the nvme_command
> >>>> referenced by the 'cmd' member has already been overwritten.
> >>>>
> >>>> To fix this problem, when setting up the nvme_command, keep the
> >>>> opcode in
> >>>> a newly added member in the nvme_tcp_request and use that instead.
> >>>
> >>> I'm wandering if we should just not reuse the cmd pdu for a subsequent
> >>> data pdu...
> >>>
> >>> It will take more space allocated per request (up to a few MBs for lots
> >>> of requests and controllers)...
> >>>
> >>> but will prevent from propagating this to nvme core...
> >>>
> >>
> >> Is there a way we can do this without increasing the memory usage ?
> >
> > The data pdu has to go somewhere. There doesn't appear to be another free area
> > other than the command pdu, so if that's off limits to preserve the original
> > sqe, then more memory is needed.
>
> If I take a somewhat extreme calculation, where a host has 16
> controllers, 128 queues of depth 128 for each I end up with additional
> 6MB, which is not insignificant for a driver...
>
> We could go with the original proposal of storing just the opcode, but
> it then propagates to the core...
>
> Or...
>
> The cmd pdu has a size of 72 bytes (header + 64 byte command), perhaps
> we can still reuse the same PDU space but rather take the last 24 bytes
> of the cmd pdu?

This also fixed the issue.
Would adding the following static check help avoid unintended changes
in the future?

BUILD_BUG_ON(sizeof(struct nvme_tcp_cmd_pdu) <=
        offsetof(struct nvme_tcp_cmd_pdu, cmd.fabrics.fctype) +
                sizeof(struct nvme_tcp_data_pdu));