[PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs
Sagi Grimberg
sagi at grimberg.me
Wed Mar 17 08:16:26 GMT 2021
>>>> Will it work if nvme mpath used request NOWAIT flag for its submit_bio()
>>>> call, and add the bio to the requeue_list if blk_queue_enter() fails? I
>>>> think that looks like another way to resolve the deadlock, but we need
>>>> the block layer to return a failed status to the original caller.
>
> Yes, I think BLK_MQ_REQ_NOWAIT makes total sense here.
BTW, the specific hang reported is not blocking on tag allocation, but
rather than on blk_queue_enter blocking on a frozen queue.
> dm-mpath also uses it for its request allocation for similar reasons.
That is the rq based dm, and I think it is because dm_mq_queue_rq is
non-blocking. Care to explain what is similar to nvme-mpath?
I don't see how bio based dm cares about any of this...
>>> But who would kick the requeue list? and that would make near-tag-exhaust performance stink...
>
> The multipath code would have to kick the list.
When? Not following your thoughts...
You are suggesting that we call submit_bio that will fail, put it on the
requeue_list and then what? blindly kick the requeue list? try to see if
there is an alternate path and then kick the list? for every bio that
comes in?
> We could also try to
> split into two flags, one that affects blk_queue_enter and one that
> affects the tag allocation.
If this is something that can work reliably then its better off, plus we
can probably kill the srcu as well. But I don't see how this would
work unfortunately.
>> moving nvme_start_freeze from nvme_rdma_teardown_io_queues to nvme_rdma_configure_io_queues can fix it.
>> It can also avoid I/O hang long time if reconnection failed.
>
> Can you explain how we'd still ensure that no new commands get queued
> during teardown using that scheme?
quiescing the queue would prevent new submissions from coming down to
the driver, but I don't see how this move can help here...
More information about the Linux-nvme
mailing list