[PATCH 5/5] nvme: retry authentication commands if DNR status bit is not set

Mon Feb 13 06:07:47 PST 2023

On 2/13/23 14:47, Sagi Grimberg wrote:
> 
>>>>>> Clear the FAILFAST_DRIVER bit for authentication commands
>>>>>> allowing them to be retried in nvme_decide_disposition() if the DNR
>>>>>> bit is not set in the command result.
>>>>>
>>>>> Can you please explain what is this fixing? and is there a test
>>>>> to reproduce this?
>>>>>
>>>>>>
>>>>>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>>>>>> ---
>>>>>>   drivers/nvme/host/auth.c | 2 ++
>>>>>>   1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
>>>>>> index ef5dcd885b0c..935f340607a7 100644
>>>>>> --- a/drivers/nvme/host/auth.c
>>>>>> +++ b/drivers/nvme/host/auth.c
>>>>>> @@ -87,6 +87,8 @@ static int nvme_auth_submit(struct nvme_ctrl 
>>>>>> *ctrl, int qid,
>>>>>>               "qid %d auth_submit failed to map, error %d\n",
>>>>>>               qid, ret);
>>>>>>       } else {
>>>>>> +        /* Clear failfast flag to allow for retries */
>>>>>> +        req->cmd_flags &= ~REQ_FAILFAST_DRIVER;
>>>>>
>>>>> This should come right after the allocation.
>>>>> But what makes this special than any other fabrics/admin command?
>>>>> Why is it different than connect for example?
>>>>
>>>> It is not different; it does, however, introduce a difference in 
>>>> behaviour.
>>>> And hence I didn't enable it for all commands, as previously we 
>>>> didn't retry, and it's questionable if we should for things like 
>>>> 'reg_read'/'reg_write' and friends.
>>>>
>>>> But you are correct for connect, though; we really should retry that 
>>>> one, too, if the DNR bit is not set.
>>>
>>> I'm not sure. if the transport is unavailable, we cannot set DNR, which
>>> will lead to a retry of the connect (or equivalent) command. Today the
>>> host will schedule another full controller connect within
>>> reconnect_delay, which is what we want right?
>>
>> I knew this would be coming up, which is why I left it out of the 
>> initial patchset.
>>
>> Long explanation:
>>
>> Thing is, connect handling is not _just_ connect handling.
>> With linux we only have a single (ioctl-like) entry for the userspace 
>> 'nvme connect' call.
>>
>> That really does several things:
>> - Setting up the transport
>> - Create the transport association for the admin q
>> - Create the admin q
>> - send 'connect' over that queue
>> - Create transport associations for the i/o qs
>> - create i/o qs
>> - send 'connect' over those queues
>>
>> And my point here is that the actual 'connect' commands really are 
>> straight NVMe commands. And as such should be subjected to the normal 
>> status evaluation rules for NVMe commands.
>> The problem here is the strange 'half-breed' nature of the connect 
>> command:
>>
>> _Technically_ the association and the 'connect' command are 
>> independent, and as such we would be entirely within spec to allow the 
>> 'connect'
>> command to fail without having the association torn down. But due to
>> our implementation that would leave us unable to re-use the association
>> for further connect commands, as that would need to be triggered by
>> userspace (which will always create a new association).
>>
>> Consequently I would advocate for having any non-retryable failure of 
>> the 'connect' command to always tear down the association, too.
>>
>> That would be a deviation from our normal rules (where we always retry
>> transport failures), but really is something we need to do if we 
>> enable DNR evaluation for the 'connect' command.
> 
> How the host handles connect or any other request failing is entirely
> the host choice.
> 
> In this patch, you decided to effectively to retry for transport errors,
> which will effectively retry nvme_max_retries times. Can you please help
> me understand what is the patch actually solving? I'm assuming that this
> is an actual completion that you see from the controller, my second
> question is what happens with this patch when the transport is the
> source of the completion?
> 
Hmm. All what this patch does is to enable retries by modifying the flow 
in nvme_decide_disposition().
And nvme_decide_disposition() only deals with NVMe errors; if
nvme_req(req)->status would be an error number we'd be in much worse 
problems than just retries.

So where do you see the transport errors being retried?

And I need to retry as some controller implementations temporarily 
return the authentication commands with the 'DNR' bit set, presumeably 
to initialize the authentication subsystem.

Cheers,

Hannes