[PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs

Sagi Grimberg sagi at grimberg.me
Tue Mar 16 06:25:07 GMT 2021


>>> We also found Similar deadlocks in the older version.
>>> However, with the latest code, it do not block grabbing the nshead srcu
>>> when ctrl is freezed.
>>> related patches:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/block/blk-core.c?id=fe2008640ae36e3920cf41507a84fb5d3227435a 
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5a6c35f9af416114588298aa7a90b15bbed15a41 
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/block/blk-core.c?id=ed00aabd5eb9fb44d6aff1173234a2e911b9fead 
>>>
>>> I am not sure they are the same problem.
>>
>> Its not the same problem.
>>
>> When we teardown the io queues, we freeze the namespaces request queues.
>> This means that concurrent mpath submit_bio calls can now block with
>> the srcu lock taken.What is the call trace of ->submit_bio()?
> The requeue work or normal submit bio?

Both.

submit_bio_noacct will try to get the queue->g_usage_counter
(blk_queue_enter), will fail because the queue is frozen, and then will
block until the queue unfreeze will wake it up to try again...



More information about the Linux-nvme mailing list