[PATCH] nvme-fabrics: fix crash for no IO queues

Sagi Grimberg sagi at grimberg.me
Tue Mar 16 23:52:42 GMT 2021


>>>>> Now we have 2 choice:
>>>>> 1.failed the connection when unable to set any I/O queues.
>>>>> 2.do not allow queue request when queue is not live.
>>>>
>>>> Okay, so there are different views on how to handles this. I
>>>> personally find
>>>> in-band administration for a misbehaving device is a good thing to
>>>> have, but I
>>>> won't 'nak' if the consensus from the people using this is for the
>>>> other way.
>>>
>>> While I understand that this can be useful, I've seen it do more harm
>>> than good. It is really puzzling to people when the controller state
>>> reflected is live (and even optimized) and no I/O is making progress for
>>> unknown reason. And logs are rarely accessed in these cases.
>>>
>>> I am also opting for failing it and rescheduling a reconnect.
>>
>> Agree with Sagi. We also hit this issue a long time ago and I made the same
>> change (commit 834d3710a093a) that Sagi is suggesting:  if the prior
>> controller instance had io queues, but the new/reconnected controller fails
>> to create io queues, then the controller create is failed and a reconnect is
>> scheduled.
> 
> Okay, fair enough.
> 
> One more question: if the controller is in such a bad way that it will
> never create IO queues without additional intervention, will this
> behavior have the driver schedule reconnect indefinitely?

Until either ctrl_loss_tmo expires or the user is tired of this
controller and manually disconnects.



More information about the Linux-nvme mailing list