[PATCH 2/2] nvme-multipath: don't block on blk_queue_enter of the underlying device
Christoph Hellwig
hch at lst.de
Tue Mar 23 16:15:44 GMT 2021
On Tue, Mar 23, 2021 at 12:36:40AM -0700, Sagi Grimberg wrote:
>> The process:
>> 1.nvme_ns_head_submit_bio call srcu_read_lock(&head->srcu).
>> 2.nvme_ns_head_submit_bio will add the bio to current->bio_list instead of
>> waiting for the frozen queue.
>
> Nothing guarantees that you have a bio_list active at any point in time,
> in fact for a workload that submits one by one you will always drain
> that list directly in the submission...
It should always be active when ->submit_bio is called.
>
>> 3.nvme_ns_head_submit_bio call srcu_read_unlock(&head->srcu, srcu_idx).
>> So nvme_ns_head_submit_bio do not hold head->srcu long when the queue is
>> frozen, can avoid deadlock.
>>
>> Sagi, suggest trying this patch.
>
> The above reproduces with the patch applied on upstream nvme code.
Weird. I don't think the deadlock in your original report should
happen due to this. Can you take a look at the callstacks in the
reproduced deadlock? Either we're missing something obvious or it is a
a somewhat different deadlock.
More information about the Linux-nvme
mailing list