[PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs

Christoph Hellwig hch at lst.de
Sat Mar 20 06:11:23 GMT 2021


On Fri, Mar 19, 2021 at 12:34:25PM -0700, Sagi Grimberg wrote:
>
>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>> index a1d476e1ac020f..92adebfaf86fd1 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -309,6 +309,7 @@ blk_qc_t nvme_ns_head_submit_bio(struct bio *bio)
>>   	 */
>>   	blk_queue_split(&bio);
>>   +retry:
>>   	srcu_idx = srcu_read_lock(&head->srcu);
>>   	ns = nvme_find_path(head);
>>   	if (likely(ns)) {
>> @@ -316,7 +317,12 @@ blk_qc_t nvme_ns_head_submit_bio(struct bio *bio)
>>   		bio->bi_opf |= REQ_NVME_MPATH;
>>   		trace_block_bio_remap(bio, disk_devt(ns->head->disk),
>>   				      bio->bi_iter.bi_sector);
>> -		ret = submit_bio_noacct(bio);
>> +
>> +		if (!blk_mq_submit_bio_direct(bio, &ret)) {
>> +			nvme_mpath_clear_current_path(ns);
>> +			srcu_read_unlock(&head->srcu, srcu_idx);
>
> Its a bit unusual to see mutation of a data protected by srcu while
> under the srcu_read_lock, can that be problematic somehow?

Hmm.  I don't think head->srcu is intended to protect the current path.
We also call nvme_mpath_clear_current_path from nvme_complete_rq or
nvme_ns_remove, which has no locking at all.  The srcu protection is
for head->list, but leaks into the current path due to the __rcu
annotations.



More information about the Linux-nvme mailing list