[PATCH v3 0/2] nvme-fabrics: short-circuit connect retries
Hannes Reinecke
hare at suse.de
Thu Mar 7 03:45:17 PST 2024
On 3/7/24 12:30, Sagi Grimberg wrote:
>
>
> On 07/03/2024 12:37, Hannes Reinecke wrote:
>> On 3/7/24 09:00, Sagi Grimberg wrote:
>>>
>>> On 05/03/2024 10:00, Daniel Wagner wrote:
>>>> I've picked up Hannes' DNR patches. In short the make the transports
>>>> behave the same way when the DNR bit set on a re-connect attempt. We
>>>> had a discussion this
>>>> topic in the past and if I got this right we all agreed is that the
>>>> host should honor the DNR bit on a connect attempt [1]
>>> Umm, I don't recall this being conclusive though. The spec ought to
>>> be clearer here I think.
>>
>> I've asked the NVMexpress fmds group, and the response was pretty
>> unanimous that the DNR bit on connect should be evaluated.
>
> OK.
>
>>
>>>>
>>>> The nvme/045 test case (authentication tests) in blktests is a good
>>>> test case for this after extending it slightly. TCP and RDMA try to
>>>> reconnect with an
>>>> invalid key over and over again, while loop and FC stop after the
>>>> first fail.
>>>
>>> Who says that invalid key is a permanent failure though?
>>>
>> See the response to the other patchset.
>> 'Invalid key' in this context means that the _client_ evaluated the
>> key as invalid, ie the key is unusable for the client.
>> As the key is passed in via the commandline there is no way the client
>> can ever change the value here, and no amount of retry will change
>> things here. That's what we try to fix.
>
> Where is this retried today, I don't see where connect failure is
> retried, outside of a periodic reconnect.
> Maybe I'm missing where what is the actual failure here.
static void nvme_tcp_reconnect_ctrl_work(struct work_struct *work)
{
struct nvme_tcp_ctrl *tcp_ctrl =
container_of(to_delayed_work(work),
struct nvme_tcp_ctrl, connect_work);
struct nvme_ctrl *ctrl = &tcp_ctrl->ctrl;
++ctrl->nr_reconnects;
if (nvme_tcp_setup_ctrl(ctrl, false))
goto requeue;
dev_info(ctrl->device, "Successfully reconnected (%d attempt)\n",
ctrl->nr_reconnects);
ctrl->nr_reconnects = 0;
return;
requeue:
dev_info(ctrl->device, "Failed reconnect attempt %d\n",
and nvme_tcp_setup_ctrl() returns either a negative errno or an NVMe
status code (which might include the DNR bit).
Cheers,
Hannes
More information about the Linux-nvme
mailing list