[PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs
Christoph Hellwig
hch at lst.de
Sat Mar 20 06:11:23 GMT 2021
On Fri, Mar 19, 2021 at 12:34:25PM -0700, Sagi Grimberg wrote:
>
>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>> index a1d476e1ac020f..92adebfaf86fd1 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -309,6 +309,7 @@ blk_qc_t nvme_ns_head_submit_bio(struct bio *bio)
>> */
>> blk_queue_split(&bio);
>> +retry:
>> srcu_idx = srcu_read_lock(&head->srcu);
>> ns = nvme_find_path(head);
>> if (likely(ns)) {
>> @@ -316,7 +317,12 @@ blk_qc_t nvme_ns_head_submit_bio(struct bio *bio)
>> bio->bi_opf |= REQ_NVME_MPATH;
>> trace_block_bio_remap(bio, disk_devt(ns->head->disk),
>> bio->bi_iter.bi_sector);
>> - ret = submit_bio_noacct(bio);
>> +
>> + if (!blk_mq_submit_bio_direct(bio, &ret)) {
>> + nvme_mpath_clear_current_path(ns);
>> + srcu_read_unlock(&head->srcu, srcu_idx);
>
> Its a bit unusual to see mutation of a data protected by srcu while
> under the srcu_read_lock, can that be problematic somehow?
Hmm. I don't think head->srcu is intended to protect the current path.
We also call nvme_mpath_clear_current_path from nvme_complete_rq or
nvme_ns_remove, which has no locking at all. The srcu protection is
for head->list, but leaks into the current path due to the __rcu
annotations.
More information about the Linux-nvme
mailing list