[PATCH 5/5] nvme: retry authentication commands if DNR status bit is not set

Tue Feb 14 01:39:20 PST 2023

>>>>>>> Clear the FAILFAST_DRIVER bit for authentication commands
>>>>>>> allowing them to be retried in nvme_decide_disposition() if the DNR
>>>>>>> bit is not set in the command result.
>>>>>>
>>>>>> Can you please explain what is this fixing? and is there a test
>>>>>> to reproduce this?
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>>>>>>> ---
>>>>>>>   drivers/nvme/host/auth.c | 2 ++
>>>>>>>   1 file changed, 2 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
>>>>>>> index ef5dcd885b0c..935f340607a7 100644
>>>>>>> --- a/drivers/nvme/host/auth.c
>>>>>>> +++ b/drivers/nvme/host/auth.c
>>>>>>> @@ -87,6 +87,8 @@ static int nvme_auth_submit(struct nvme_ctrl 
>>>>>>> *ctrl, int qid,
>>>>>>>               "qid %d auth_submit failed to map, error %d\n",
>>>>>>>               qid, ret);
>>>>>>>       } else {
>>>>>>> +        /* Clear failfast flag to allow for retries */
>>>>>>> +        req->cmd_flags &= ~REQ_FAILFAST_DRIVER;
>>>>>>
>>>>>> This should come right after the allocation.
>>>>>> But what makes this special than any other fabrics/admin command?
>>>>>> Why is it different than connect for example?
>>>>>
>>>>> It is not different; it does, however, introduce a difference in 
>>>>> behaviour.
>>>>> And hence I didn't enable it for all commands, as previously we 
>>>>> didn't retry, and it's questionable if we should for things like 
>>>>> 'reg_read'/'reg_write' and friends.
>>>>>
>>>>> But you are correct for connect, though; we really should retry 
>>>>> that one, too, if the DNR bit is not set.
>>>>
>>>> I'm not sure. if the transport is unavailable, we cannot set DNR, which
>>>> will lead to a retry of the connect (or equivalent) command. Today the
>>>> host will schedule another full controller connect within
>>>> reconnect_delay, which is what we want right?
>>>
>>> I knew this would be coming up, which is why I left it out of the 
>>> initial patchset.
>>>
>>> Long explanation:
>>>
>>> Thing is, connect handling is not _just_ connect handling.
>>> With linux we only have a single (ioctl-like) entry for the userspace 
>>> 'nvme connect' call.
>>>
>>> That really does several things:
>>> - Setting up the transport
>>> - Create the transport association for the admin q
>>> - Create the admin q
>>> - send 'connect' over that queue
>>> - Create transport associations for the i/o qs
>>> - create i/o qs
>>> - send 'connect' over those queues
>>>
>>> And my point here is that the actual 'connect' commands really are 
>>> straight NVMe commands. And as such should be subjected to the normal 
>>> status evaluation rules for NVMe commands.
>>> The problem here is the strange 'half-breed' nature of the connect 
>>> command:
>>>
>>> _Technically_ the association and the 'connect' command are 
>>> independent, and as such we would be entirely within spec to allow 
>>> the 'connect'
>>> command to fail without having the association torn down. But due to
>>> our implementation that would leave us unable to re-use the association
>>> for further connect commands, as that would need to be triggered by
>>> userspace (which will always create a new association).
>>>
>>> Consequently I would advocate for having any non-retryable failure of 
>>> the 'connect' command to always tear down the association, too.
>>>
>>> That would be a deviation from our normal rules (where we always retry
>>> transport failures), but really is something we need to do if we 
>>> enable DNR evaluation for the 'connect' command.
>>
>> How the host handles connect or any other request failing is entirely
>> the host choice.
>>
>> In this patch, you decided to effectively to retry for transport errors,
>> which will effectively retry nvme_max_retries times. Can you please help
>> me understand what is the patch actually solving? I'm assuming that this
>> is an actual completion that you see from the controller, my second
>> question is what happens with this patch when the transport is the
>> source of the completion?
>>
> Hmm. All what this patch does is to enable retries by modifying the flow 
> in nvme_decide_disposition().
> And nvme_decide_disposition() only deals with NVMe errors; if
> nvme_req(req)->status would be an error number we'd be in much worse 
> problems than just retries.

Hannes, can you please describe the flow that went wrong here and
you attempted to fix?

My assumption is that the host got a non-zero nvme status with DNR
cleared. Is that correct? This is not described in the commit msg, hence
me asking.

> So where do you see the transport errors being retried?

Effectively, all transport errors do not set the DNR bit, and will
be retried within crdt, until nvme_max_retries is reached. My question
is weather this affects the reconnect flow given that we go and
cancel all inflight requests (NVME_SC_HOST_ABORTED_CMD).
Does this mean that if now the auth is inflight, it will retry?
causing the teardown to block?

Same question applies to other driver requests that you say that
should comply to the DNR bit.

> And I need to retry as some controller implementations temporarily 
> return the authentication commands with the 'DNR' bit set, presumeably 
> to initialize the authentication subsystem.

That is unrelated to this patch set though.