[PATCH V2 0/4] nvme: fix two kinds of IO hang from removing NSs

Sagi Grimberg sagi at grimberg.me
Wed Jun 21 03:13:05 PDT 2023


>>>>> Hello,
>>>>>
>>>>> The 1st three patch fixes io hang when controller removal interrupts error
>>>>> recovery, then queue is left as frozen.
>>>>>
>>>>> The 4th patch fixes io hang when controller is left as unquiesce.
>>>>
>>>> Ming, what happened to nvme-tcp/rdma move of freeze/unfreeze to the
>>>> connect patches?
>>>
>>> I'd suggest to handle all drivers(include nvme-pci) in same logic for avoiding
>>> extra maintain burden wrt. error handling, but looks Keith worries about the
>>> delay freezing may cause too many requests queued during error handling, and
>>> that might cause user report.
>>
>> For nvme-tcp/rdma your patch also addresses IO not failing over because
>> they block on queue enter. So I definitely want this for fabrics.
> 
> The patch in the following link should fix these issues too:
> 
> https://lore.kernel.org/linux-block/ZJGmW7lEaipT6saa@ovpn-8-23.pek2.redhat.com/T/#u
> 
> I guess you still want the paired freeze patch because it makes freeze &
> unfreeze more reliable in error handling. If yes, I can make one fabric
> only change for you.

Not sure exactly what reliability is referred here. I agree that there
is an issue with controller delete during error recovery. The patch
was a way to side-step it, great. But it addressed I/O blocked on enter
and not failing over.

So yes, for fabrics we should have it. I would argue that it would be
the right thing to do for pci as well. But I won't argue if Keith feels
otherwise.



More information about the Linux-nvme mailing list