[PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs

Sagi Grimberg sagi at grimberg.me
Thu Mar 18 18:46:08 GMT 2021


>>>>>>> Will it work if nvme mpath used request NOWAIT flag for its 
>>>>>>> submit_bio()
>>>>>>> call, and add the bio to the requeue_list if blk_queue_enter() 
>>>>>>> fails? I
>>>>>>> think that looks like another way to resolve the deadlock, but we 
>>>>>>> need
>>>>>>> the block layer to return a failed status to the original caller.
>>>>
>>>> Yes, I think BLK_MQ_REQ_NOWAIT makes total sense here.  dm-mpath also
>>>> uses it for its request allocation for similar reasons.
>>>>
>>>>>>
>>>>>> But who would kick the requeue list? and that would make 
>>>>>> near-tag-exhaust performance stink...
>>>>
>>>> The multipath code would have to kick the list.  We could also try to
>>>> split into two flags, one that affects blk_queue_enter and one that
>>>> affects the tag allocation.
>>>>
>>>>> moving nvme_start_freeze from nvme_rdma_teardown_io_queues to 
>>>>> nvme_rdma_configure_io_queues can fix it.
>>>>> It can also avoid I/O hang long time if reconnection failed.
>>>>
>>>> Can you explain how we'd still ensure that no new commands get queued
>>>> during teardown using that scheme?
>>> 1. tear down will cancel all inflight requests, and then multipath 
>>> will clear the path.
>>> 2. and then we may freeze the controler.
>>> 3. nvme_ns_head_submit_bio can not find the reconnection controller 
>>> as valid path, so it is safe.
>>
>> In non-mpath (which unfortunately is a valid use-case), there is no
>> failover, and we cannot freeze the queue after we stopped (and/or
>> started) the queues because then fail_non_ready_command() constantly 
>> return BLK_STS_RESOURCE (just causing a re-submission over and over
>> again) and the freeze will never complete (the commands are still
>> inflight from the queue->g_usage_counter perspective).
> If the request set the flags to REQ_FAILFAST_xxx, will hang long time if 
> reconnection failed.
> This is not expected.
> Another, If the controller is not live and the controller is freezed 
> ,fast_io_fail_tmo will not work.
> This is also not expected.

No arguments that the queue needs to unfreeze asap for mpath, that
is exactly what the patch does. The only unnatural part is the
non-mpath case where if we unfreeze the queue before we reconnect
I/Os will fail, which is we should also respect fast_fail_tmo.

The main issue here is that there are two behaviors that we
should maintain based if its mpath or non-mpath...

> So I think freezing the controller when reconnecting is not good idea.

As said, for mpath its for sure not, but for non-mpath that matches
the expected behavior.

> It's really not good behavior to try again and again. This is at least 
> better than request hang long time.

I am not sure I understand how that even supposed to work TBH.

>> So I think we should still start queue freeze before we quiesce
>> the queues.
> We should unquiesce and unfreeze the queues when reconnecting, otherwise 
> fast_io_fail_tmo will not work.
>>
>> I still don't see how the mpath NOWAIT suggestion works either...
> mpath will queuue request to other live path or requeue the request(if 
> no used path), so it will not wait.

Placing the request on the requeue_list is fine, but the question is
when to kick the requeue_work, nothing guarantees that an alternate path
exist or will in a sane period. So constantly requeue+kick sounds like
a really bad practice to me.



More information about the Linux-nvme mailing list