[PATCH v7] nvme-fabrics: reject I/O to offline device

Sagi Grimberg sagi at grimberg.me
Tue Aug 11 16:56:04 EDT 2020


>> Commands get stuck while Host NVMe controller (TCP or RDMA) is in
>> reconnect state. NVMe controller enters into reconnect state when it
>> loses connection with the target. It tries to reconnect every 10
>> seconds (default) until successful reconnection or until reconnect
>> time-out is reached. The default reconnect time out is 10 minutes.
>>
>> To fix this long delay due to the default timeout we introduce new
>> session parameter "fast_io_fail_tmo". The timeout is measured in
>> seconds from the controller reconnect, any command beyond that
>> timeout is rejected. The new parameter value may be passed during
>> 'connect'.
>> The default value of 0 means no timeout (similar to current behavior).
> 
> I'd like to remind you that this improvement is pending for commit.
> Pay attention, please.

I think we still have a question to why is this fabrics specific, and
if this is needed in fabrics, why is it not needed in pci as well.

Keith? Personally speaking, I also share Chirstoph's opinion that
if it's not clearly fabrics specific, we should try to make pci
and fabrics unified.

Your thoughts on this?



More information about the Linux-nvme mailing list