[PATCH v3 8/9] nvme-rdma: fix timeout handler

Christoph Hellwig hch at lst.de
Thu Aug 20 02:10:33 EDT 2020


On Wed, Aug 19, 2020 at 10:36:50PM -0700, Sagi Grimberg wrote:
> When a request times out in a LIVE state, we simply trigger error
> recovery and let the error recovery handle the request cancellation,
> however when a request times out in a non LIVE state, we make sure to
> complete it immediately as it might block controller setup or teardown
> and prevent forward progress.
> 
> However tearing down the entire set of I/O and admin queues causes
> freeze/unfreeze imbalance (q->mq_freeze_depth) because and is really
> an overkill to what we actually need, which is to just fence controller
> teardown that may be running, stop the queue, and cancel the request if
> it is not already completed.
> 
> Now that we have the controller teardown_lock, we can safely serialize
> request cancellation. This addresses a hang caused by calling extra
> queue freeze on controller namespaces, causing unfreeze to not complete
> correctly.

I still think this should be dev_info instead of dev_warn, but otherwise:

Reviewed-by: Christoph Hellwig <hch at lst.de>



More information about the Linux-nvme mailing list