[PATCH] nvme-tcp: print correct opcode on timeout and error handling

Mon Mar 13 01:54:55 PDT 2023

>>>> Sagi,
>>>>
>>>> On 3/8/23 03:06, Sagi Grimberg wrote:
>>>>>
>>>>>> In timeout and error handling, various information is reported about the
>>>>>> command in error, but the printed opcode may not be correct
>>>>>>
>>>>>> The opcode is obtained from the nvme_command structure referenced by the
>>>>>> 'cmd' member in the nvme_request structure.
>>>>>> (i.e. nvme_req(req)->cmd->common.opcode)
>>>>>>
>>>>>> For the nvme-tcp driver, the 'cmd' member in the nvme_request structure
>>>>>> points to a structure within the page fragment allocated by
>>>>>> nvme_tcp_init_request().  This page fragment is used as a command
>>>>>> capsule
>>>>>> PDU, and then may be reused as a h2c data PDU, so the nvme_command
>>>>>> referenced by the 'cmd' member has already been overwritten.
>>>>>>
>>>>>> To fix this problem, when setting up the nvme_command, keep the
>>>>>> opcode in
>>>>>> a newly added member in the nvme_tcp_request and use that instead.
>>>>>
>>>>> I'm wandering if we should just not reuse the cmd pdu for a subsequent
>>>>> data pdu...
>>>>>
>>>>> It will take more space allocated per request (up to a few MBs for lots
>>>>> of requests and controllers)...
>>>>>
>>>>> but will prevent from propagating this to nvme core...
>>>>>
>>>>
>>>> Is there a way we can do this without increasing the memory usage ?
>>>
>>> The data pdu has to go somewhere. There doesn't appear to be another free area
>>> other than the command pdu, so if that's off limits to preserve the original
>>> sqe, then more memory is needed.
>>
>> If I take a somewhat extreme calculation, where a host has 16
>> controllers, 128 queues of depth 128 for each I end up with additional
>> 6MB, which is not insignificant for a driver...
>>
>> We could go with the original proposal of storing just the opcode, but
>> it then propagates to the core...
>>
>> Or...
>>
>> The cmd pdu has a size of 72 bytes (header + 64 byte command), perhaps
>> we can still reuse the same PDU space but rather take the last 24 bytes
>> of the cmd pdu?
> 
> This also fixed the issue.

I'm assuming this is equivalent to your Tested-by tag?

> Would adding the following static check help avoid unintended changes
> in the future?
> 
> BUILD_BUG_ON(sizeof(struct nvme_tcp_cmd_pdu) <=
>          offsetof(struct nvme_tcp_cmd_pdu, cmd.fabrics.fctype) +
>                  sizeof(struct nvme_tcp_data_pdu));

Umm, The arithmetic itself in nvme_tcp_req_data_pdu prevents this sort
of thing. What we probably need regardless and is missing is:
--
@@ -2679,6 +2691,15 @@ static struct nvmf_transport_ops 
nvme_tcp_transport = {

  static int __init nvme_tcp_init_module(void)
  {
+       BUILD_BUG_ON(sizeof(struct nvme_tcp_hdr) != 8);
+       BUILD_BUG_ON(sizeof(struct nvme_tcp_cmd_pdu) != 72);
+       BUILD_BUG_ON(sizeof(struct nvme_tcp_data_pdu) != 24);
+       BUILD_BUG_ON(sizeof(struct nvme_tcp_rsp_pdu) != 24);
+       BUILD_BUG_ON(sizeof(struct nvme_tcp_r2t_pdu) != 24);
+       BUILD_BUG_ON(sizeof(struct nvme_tcp_icreq_pdu) != 128);
+       BUILD_BUG_ON(sizeof(struct nvme_tcp_icresp_pdu) != 128);
+       BUILD_BUG_ON(sizeof(struct nvme_tcp_term_pdu) != 24);
+
         nvme_tcp_wq = alloc_workqueue("nvme_tcp_wq",
                         WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
         if (!nvme_tcp_wq)
--

I'll send this and the fix promptly.