[PATCH] nvme-tcp: print correct opcode on timeout and error handling

Wed Mar 8 07:31:22 PST 2023

2023年3月8日(水) 20:06 Sagi Grimberg <sagi at grimberg.me>:
>
>
> > In timeout and error handling, various information is reported about the
> > command in error, but the printed opcode may not be correct
> >
> > The opcode is obtained from the nvme_command structure referenced by the
> > 'cmd' member in the nvme_request structure.
> > (i.e. nvme_req(req)->cmd->common.opcode)
> >
> > For the nvme-tcp driver, the 'cmd' member in the nvme_request structure
> > points to a structure within the page fragment allocated by
> > nvme_tcp_init_request().  This page fragment is used as a command capsule
> > PDU, and then may be reused as a h2c data PDU, so the nvme_command
> > referenced by the 'cmd' member has already been overwritten.
> >
> > To fix this problem, when setting up the nvme_command, keep the opcode in
> > a newly added member in the nvme_tcp_request and use that instead.
>
> I'm wandering if we should just not reuse the cmd pdu for a subsequent
> data pdu...
>
> It will take more space allocated per request (up to a few MBs for lots
> of requests and controllers)...
>
> but will prevent from propagating this to nvme core...
>
> Something like the below (untested):

Your fix looks good. It actually fixed the problem as well.