[PATCH 5/5] nvme: retry authentication commands if DNR status bit is not set

Mon Feb 13 05:24:43 PST 2023

On 2/13/23 11:33, Sagi Grimberg wrote:
> 
>>>> Clear the FAILFAST_DRIVER bit for authentication commands
>>>> allowing them to be retried in nvme_decide_disposition() if the DNR
>>>> bit is not set in the command result.
>>>
>>> Can you please explain what is this fixing? and is there a test
>>> to reproduce this?
>>>
>>>>
>>>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>>>> ---
>>>>   drivers/nvme/host/auth.c | 2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
>>>> index ef5dcd885b0c..935f340607a7 100644
>>>> --- a/drivers/nvme/host/auth.c
>>>> +++ b/drivers/nvme/host/auth.c
>>>> @@ -87,6 +87,8 @@ static int nvme_auth_submit(struct nvme_ctrl 
>>>> *ctrl, int qid,
>>>>               "qid %d auth_submit failed to map, error %d\n",
>>>>               qid, ret);
>>>>       } else {
>>>> +        /* Clear failfast flag to allow for retries */
>>>> +        req->cmd_flags &= ~REQ_FAILFAST_DRIVER;
>>>
>>> This should come right after the allocation.
>>> But what makes this special than any other fabrics/admin command?
>>> Why is it different than connect for example?
>>
>> It is not different; it does, however, introduce a difference in 
>> behaviour.
>> And hence I didn't enable it for all commands, as previously we didn't 
>> retry, and it's questionable if we should for things like 
>> 'reg_read'/'reg_write' and friends.
>>
>> But you are correct for connect, though; we really should retry that 
>> one, too, if the DNR bit is not set.
> 
> I'm not sure. if the transport is unavailable, we cannot set DNR, which
> will lead to a retry of the connect (or equivalent) command. Today the
> host will schedule another full controller connect within
> reconnect_delay, which is what we want right?

I knew this would be coming up, which is why I left it out of the 
initial patchset.

Long explanation:

Thing is, connect handling is not _just_ connect handling.
With linux we only have a single (ioctl-like) entry for the userspace 
'nvme connect' call.

That really does several things:
- Setting up the transport
- Create the transport association for the admin q
- Create the admin q
- send 'connect' over that queue
- Create transport associations for the i/o qs
- create i/o qs
- send 'connect' over those queues

And my point here is that the actual 'connect' commands really are 
straight NVMe commands. And as such should be subjected to the normal 
status evaluation rules for NVMe commands.
The problem here is the strange 'half-breed' nature of the connect command:

_Technically_ the association and the 'connect' command are independent, 
and as such we would be entirely within spec to allow the 'connect'
command to fail without having the association torn down. But due to
our implementation that would leave us unable to re-use the association
for further connect commands, as that would need to be triggered by
userspace (which will always create a new association).

Consequently I would advocate for having any non-retryable failure of 
the 'connect' command to always tear down the association, too.

That would be a deviation from our normal rules (where we always retry
transport failures), but really is something we need to do if we enable 
DNR evaluation for the 'connect' command.

And we would need an agreement on this, hence I didn't attempt it in 
this patchset.

(Actually I'm wondering if it's worth a session at LSF; this has been a 
long-running topic without us being able to make meaningful progress)

Cheers,

Hannes