[PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs

Wed Mar 17 18:43:59 GMT 2021

>>>>> Will it work if nvme mpath used request NOWAIT flag for its 
>>>>> submit_bio()
>>>>> call, and add the bio to the requeue_list if blk_queue_enter() 
>>>>> fails? I
>>>>> think that looks like another way to resolve the deadlock, but we need
>>>>> the block layer to return a failed status to the original caller.
>>
>> Yes, I think BLK_MQ_REQ_NOWAIT makes total sense here.  dm-mpath also
>> uses it for its request allocation for similar reasons.
>>
>>>>
>>>> But who would kick the requeue list? and that would make 
>>>> near-tag-exhaust performance stink...
>>
>> The multipath code would have to kick the list.  We could also try to
>> split into two flags, one that affects blk_queue_enter and one that
>> affects the tag allocation.
>>
>>> moving nvme_start_freeze from nvme_rdma_teardown_io_queues to 
>>> nvme_rdma_configure_io_queues can fix it.
>>> It can also avoid I/O hang long time if reconnection failed.
>>
>> Can you explain how we'd still ensure that no new commands get queued
>> during teardown using that scheme?
> 1. tear down will cancel all inflight requests, and then multipath will 
> clear the path.
> 2. and then we may freeze the controler.
> 3. nvme_ns_head_submit_bio can not find the reconnection controller as 
> valid path, so it is safe.

In non-mpath (which unfortunately is a valid use-case), there is no
failover, and we cannot freeze the queue after we stopped (and/or
started) the queues because then fail_non_ready_command() constantly 
return BLK_STS_RESOURCE (just causing a re-submission over and over
again) and the freeze will never complete (the commands are still
inflight from the queue->g_usage_counter perspective).

So I think we should still start queue freeze before we quiesce
the queues.

I still don't see how the mpath NOWAIT suggestion works either...