[PATCH] nvme-tcp: print correct opcode on timeout and error handling

Chaitanya Kulkarni chaitanyak at nvidia.com
Wed Mar 8 15:51:21 PST 2023


Sagi,

On 3/8/23 03:06, Sagi Grimberg wrote:
>
>> In timeout and error handling, various information is reported about the
>> command in error, but the printed opcode may not be correct
>>
>> The opcode is obtained from the nvme_command structure referenced by the
>> 'cmd' member in the nvme_request structure.
>> (i.e. nvme_req(req)->cmd->common.opcode)
>>
>> For the nvme-tcp driver, the 'cmd' member in the nvme_request structure
>> points to a structure within the page fragment allocated by
>> nvme_tcp_init_request().  This page fragment is used as a command 
>> capsule
>> PDU, and then may be reused as a h2c data PDU, so the nvme_command
>> referenced by the 'cmd' member has already been overwritten.
>>
>> To fix this problem, when setting up the nvme_command, keep the 
>> opcode in
>> a newly added member in the nvme_tcp_request and use that instead.
>
> I'm wandering if we should just not reuse the cmd pdu for a subsequent
> data pdu...
>
> It will take more space allocated per request (up to a few MBs for lots
> of requests and controllers)...
>
> but will prevent from propagating this to nvme core...
>

Is there a way we can do this without increasing the memory usage ?

-ck




More information about the Linux-nvme mailing list