[PATCH V2 0/4] nvme: fix two kinds of IO hang from removing NSs

Sagi Grimberg sagi at grimberg.me
Tue Jun 20 06:40:49 PDT 2023


>>> Hello,
>>>
>>> The 1st three patch fixes io hang when controller removal interrupts error
>>> recovery, then queue is left as frozen.
>>>
>>> The 4th patch fixes io hang when controller is left as unquiesce.
>>
>> Ming, what happened to nvme-tcp/rdma move of freeze/unfreeze to the
>> connect patches?
> 
> I'd suggest to handle all drivers(include nvme-pci) in same logic for avoiding
> extra maintain burden wrt. error handling, but looks Keith worries about the
> delay freezing may cause too many requests queued during error handling, and
> that might cause user report.

For nvme-tcp/rdma your patch also addresses IO not failing over because
they block on queue enter. So I definitely want this for fabrics.

AFAICT nvme-pci would also want to failover asap for dual-ported
multipathed devices, not sure if this is something that we are
interested in optimizing though, as pci either succeeds the reset,
or removes the gendisk. But the time-frame is different for fabrics
for sure.



More information about the Linux-nvme mailing list