[PATCH v3 0/2] nvme-fabrics: short-circuit connect retries

Hannes Reinecke hare at suse.de
Thu Mar 7 03:45:17 PST 2024


On 3/7/24 12:30, Sagi Grimberg wrote:
> 
> 
> On 07/03/2024 12:37, Hannes Reinecke wrote:
>> On 3/7/24 09:00, Sagi Grimberg wrote:
>>>
>>> On 05/03/2024 10:00, Daniel Wagner wrote:
>>>> I've picked up Hannes' DNR patches. In short the make the transports 
>>>> behave the same way when the DNR bit set on a re-connect attempt. We
>>>> had a discussion this
>>>> topic in the past and if I got this right we all agreed is that the 
>>>> host should honor the DNR bit on a connect attempt [1]
>>> Umm, I don't recall this being conclusive though. The spec ought to 
>>> be clearer here I think.
>>
>> I've asked the NVMexpress fmds group, and the response was pretty 
>> unanimous that the DNR bit on connect should be evaluated.
> 
> OK.
> 
>>
>>>>
>>>> The nvme/045 test case (authentication tests) in blktests is a good 
>>>> test case for this after extending it slightly. TCP and RDMA try to
>>>> reconnect with an
>>>> invalid key over and over again, while loop and FC stop after the 
>>>> first fail.
>>>
>>> Who says that invalid key is a permanent failure though?
>>>
>> See the response to the other patchset.
>> 'Invalid key' in this context means that the _client_ evaluated the 
>> key as invalid, ie the key is unusable for the client.
>> As the key is passed in via the commandline there is no way the client
>> can ever change the value here, and no amount of retry will change 
>> things here. That's what we try to fix.
> 
> Where is this retried today, I don't see where connect failure is 
> retried, outside of a periodic reconnect.
> Maybe I'm missing where what is the actual failure here.

static void nvme_tcp_reconnect_ctrl_work(struct work_struct *work)
{
         struct nvme_tcp_ctrl *tcp_ctrl =
                         container_of(to_delayed_work(work),
                         struct nvme_tcp_ctrl, connect_work);
         struct nvme_ctrl *ctrl = &tcp_ctrl->ctrl;

         ++ctrl->nr_reconnects;

         if (nvme_tcp_setup_ctrl(ctrl, false))
                 goto requeue;

         dev_info(ctrl->device, "Successfully reconnected (%d attempt)\n",
                         ctrl->nr_reconnects);

         ctrl->nr_reconnects = 0;

         return;

requeue:
         dev_info(ctrl->device, "Failed reconnect attempt %d\n",

and nvme_tcp_setup_ctrl() returns either a negative errno or an NVMe 
status code (which might include the DNR bit).

Cheers,

Hannes




More information about the Linux-nvme mailing list