[PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended

Wenbo Wang wenbo.wang at memblaze.com
Wed Feb 3 08:35:03 PST 2016


For async io (executed by run_work worker) it should work. However for sync io in blk_mq_run_hw_queue, this seems not help, there is still a window.

-----Original Message-----
From: Keith Busch [mailto:keith.busch at intel.com] 
Sent: Wednesday, February 3, 2016 10:41 PM
To: Wenbo Wang
Cc: Jens Axboe; Wenbo Wang; linux-kernel at vger.kernel.org; linux-nvme at lists.infradead.org; Wenwei.Tao
Subject: Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended

On Tue, Feb 02, 2016 at 07:15:57AM +0000, Wenbo Wang wrote:
> I did the following test to validate the issue.
> 
> 1. Modify code as below to increase the chance of races.
> 	Add 10s delay after nvme_dev_unmap() in nvme_dev_disable()
> 	Add 10s delay before __nvme_submit_cmd() 2. Run dd and at the same 
> time, echo 1 to reset_controller to trigger device reset. Finally kernel crashes due to accessing unmapped door bell register.
> 
> Following is the execution order of the two code paths:
> __blk_mq_run_hw_queue
>   Test BLK_MQ_S_STOPPED
> 					nvme_dev_disable()
> 					     nvme_stop_queues()  <-- set BLK_MQ_S_STOPPED
> 					     nvme_dev_unmap(dev)  <-- unmap door bell
>   nvme_queue_rq()
>       Touch door bell	<-- panic here

Does the following force the first to complete before the unmap?

---
@@ -1415,10 +1421,21 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
+		blk_sync_queue(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
--


More information about the Linux-nvme mailing list