[PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs
Christoph Hellwig
hch at lst.de
Wed Mar 17 06:59:10 GMT 2021
On Wed, Mar 17, 2021 at 10:55:57AM +0800, Chao Leng wrote:
>>> Will it work if nvme mpath used request NOWAIT flag for its submit_bio()
>>> call, and add the bio to the requeue_list if blk_queue_enter() fails? I
>>> think that looks like another way to resolve the deadlock, but we need
>>> the block layer to return a failed status to the original caller.
Yes, I think BLK_MQ_REQ_NOWAIT makes total sense here. dm-mpath also
uses it for its request allocation for similar reasons.
>>
>> But who would kick the requeue list? and that would make near-tag-exhaust performance stink...
The multipath code would have to kick the list. We could also try to
split into two flags, one that affects blk_queue_enter and one that
affects the tag allocation.
> moving nvme_start_freeze from nvme_rdma_teardown_io_queues to nvme_rdma_configure_io_queues can fix it.
> It can also avoid I/O hang long time if reconnection failed.
Can you explain how we'd still ensure that no new commands get queued
during teardown using that scheme?
More information about the Linux-nvme
mailing list