[PATCH 0/2] nvme-fabrics: short-circuit connect retries
James Smart
jsmart2021 at gmail.com
Sun Jun 27 06:39:14 PDT 2021
On 6/26/2021 5:09 AM, Hannes Reinecke wrote:
> On 6/26/21 3:03 AM, Chao Leng wrote:
>>
>>
>> On 2021/6/24 16:10, Hannes Reinecke wrote:
>>> On 6/24/21 9:29 AM, Chao Leng wrote:
>>>>
>>>>
>>>> On 2021/6/24 13:51, Hannes Reinecke wrote:
>>>>> On 6/23/21 11:38 PM, Sagi Grimberg wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> commit f25f8ef70ce2 ("nvme-fc: short-circuit reconnect retries")
>>>>>>> allowed the fc transport to honour the DNR bit during reconnect
>>>>>>> retries, allowing to speed up error recovery.
>>>>>>
>>>>>> How does this speed up error recovery?
>>>>>
>>>>> Well, not exactly error recovery (as there is nothing to recover).
>>>>> But we won't attempt pointless retries, thereby reducing the noise in
>>>>> the message log.
>>>> This conflict with the tcp and rdma target.
>>>> You may need to delete the improper NVME_SC_DNR at the target.
>>>> However, this will cause compatibility issues between different
>>>> versions.
>>>
>>> Which ones?
>> In many scenarios, the destination sets DNR for abnormal packets,
>> but each new connection may not have the same error.
>
> This patch series is only for the DNR bit set in response to the
> 'connect' command.
> If the target is not able to process the 'connect' command, but may be
> so in the future it really should not set the DNR bit.
>
>>> I checked the DNR usage in the target code, and they seem to set it
>>> correctly (ie the result would not change when the command is retried).
>>> With the possible exception of ENOSPC handling, as this is arguably
>>> dynamic and might change with a retry.
>> The DNR status of the old connection may not be relevant to the
>> re-established connection.
>
> See above.
> We are just checking the DNR settings for the 'connect' command (or any
> other commands being sent during initial controller configuration).
> If that fails the connect never was properly initialized; if the
> controller would return a different status after reconnect it simply
> should not set the DNR bit ...
>
> Cheers,
>
> Hannes
Agreed. Since 1.3 spec says: "If set to ‘1’, indicates that if the same
command is re-submitted to any controller in the NVM subsystem, then
that re-submitted command is expected to fail."
Thus if the initial connect fails in this manner, any new association
will be on a different controller, where it is now expected connect on
that controller will fail too. Thus - why continue to connect when it's
expected each will fail.
-- james
More information about the Linux-nvme
mailing list