[PATCH 5/5] nvme: retry authentication commands if DNR status bit is not set
Hannes Reinecke
hare at suse.de
Mon Feb 13 06:07:47 PST 2023
On 2/13/23 14:47, Sagi Grimberg wrote:
>
>>>>>> Clear the FAILFAST_DRIVER bit for authentication commands
>>>>>> allowing them to be retried in nvme_decide_disposition() if the DNR
>>>>>> bit is not set in the command result.
>>>>>
>>>>> Can you please explain what is this fixing? and is there a test
>>>>> to reproduce this?
>>>>>
>>>>>>
>>>>>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>>>>>> ---
>>>>>> drivers/nvme/host/auth.c | 2 ++
>>>>>> 1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
>>>>>> index ef5dcd885b0c..935f340607a7 100644
>>>>>> --- a/drivers/nvme/host/auth.c
>>>>>> +++ b/drivers/nvme/host/auth.c
>>>>>> @@ -87,6 +87,8 @@ static int nvme_auth_submit(struct nvme_ctrl
>>>>>> *ctrl, int qid,
>>>>>> "qid %d auth_submit failed to map, error %d\n",
>>>>>> qid, ret);
>>>>>> } else {
>>>>>> + /* Clear failfast flag to allow for retries */
>>>>>> + req->cmd_flags &= ~REQ_FAILFAST_DRIVER;
>>>>>
>>>>> This should come right after the allocation.
>>>>> But what makes this special than any other fabrics/admin command?
>>>>> Why is it different than connect for example?
>>>>
>>>> It is not different; it does, however, introduce a difference in
>>>> behaviour.
>>>> And hence I didn't enable it for all commands, as previously we
>>>> didn't retry, and it's questionable if we should for things like
>>>> 'reg_read'/'reg_write' and friends.
>>>>
>>>> But you are correct for connect, though; we really should retry that
>>>> one, too, if the DNR bit is not set.
>>>
>>> I'm not sure. if the transport is unavailable, we cannot set DNR, which
>>> will lead to a retry of the connect (or equivalent) command. Today the
>>> host will schedule another full controller connect within
>>> reconnect_delay, which is what we want right?
>>
>> I knew this would be coming up, which is why I left it out of the
>> initial patchset.
>>
>> Long explanation:
>>
>> Thing is, connect handling is not _just_ connect handling.
>> With linux we only have a single (ioctl-like) entry for the userspace
>> 'nvme connect' call.
>>
>> That really does several things:
>> - Setting up the transport
>> - Create the transport association for the admin q
>> - Create the admin q
>> - send 'connect' over that queue
>> - Create transport associations for the i/o qs
>> - create i/o qs
>> - send 'connect' over those queues
>>
>> And my point here is that the actual 'connect' commands really are
>> straight NVMe commands. And as such should be subjected to the normal
>> status evaluation rules for NVMe commands.
>> The problem here is the strange 'half-breed' nature of the connect
>> command:
>>
>> _Technically_ the association and the 'connect' command are
>> independent, and as such we would be entirely within spec to allow the
>> 'connect'
>> command to fail without having the association torn down. But due to
>> our implementation that would leave us unable to re-use the association
>> for further connect commands, as that would need to be triggered by
>> userspace (which will always create a new association).
>>
>> Consequently I would advocate for having any non-retryable failure of
>> the 'connect' command to always tear down the association, too.
>>
>> That would be a deviation from our normal rules (where we always retry
>> transport failures), but really is something we need to do if we
>> enable DNR evaluation for the 'connect' command.
>
> How the host handles connect or any other request failing is entirely
> the host choice.
>
> In this patch, you decided to effectively to retry for transport errors,
> which will effectively retry nvme_max_retries times. Can you please help
> me understand what is the patch actually solving? I'm assuming that this
> is an actual completion that you see from the controller, my second
> question is what happens with this patch when the transport is the
> source of the completion?
>
Hmm. All what this patch does is to enable retries by modifying the flow
in nvme_decide_disposition().
And nvme_decide_disposition() only deals with NVMe errors; if
nvme_req(req)->status would be an error number we'd be in much worse
problems than just retries.
So where do you see the transport errors being retried?
And I need to retry as some controller implementations temporarily
return the authentication commands with the 'DNR' bit set, presumeably
to initialize the authentication subsystem.
Cheers,
Hannes
More information about the Linux-nvme
mailing list