[PATCH 9/9] [RFC] nvme: Fix a race condition

Tue Sep 27 10:09:20 PDT 2016

On 09/27/2016 09:56 AM, James Bottomley wrote:
> On Tue, 2016-09-27 at 09:43 -0700, Bart Van Assche wrote:
>> On 09/27/2016 09:31 AM, Steve Wise wrote:
>>>> @@ -2079,11 +2075,15 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
>>>>  void nvme_stop_queues(struct nvme_ctrl *ctrl)
>>>>  {
>>>>  	struct nvme_ns *ns;
>>>> +	struct request_queue *q;
>>>>
>>>>  	mutex_lock(&ctrl->namespaces_mutex);
>>>>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
>>>> -		blk_mq_cancel_requeue_work(ns->queue);
>>>> -		blk_mq_stop_hw_queues(ns->queue);
>>>> +		q = ns->queue;
>>>> +		blk_quiesce_queue(q);
>>>> +		blk_mq_cancel_requeue_work(q);
>>>> +		blk_mq_stop_hw_queues(q);
>>>> +		blk_resume_queue(q);
>>>>  	}
>>>>  	mutex_unlock(&ctrl->namespaces_mutex);
>>>
>>> Hey Bart, should nvme_stop_queues() really be resuming the blk
>>> queue?
>>
>> Hello Steve,
>>
>> Would you perhaps prefer that blk_resume_queue(q) is called from
>> nvme_start_queues()? I think that would make the NVMe code harder to
>> review. The above code won't cause any unexpected side effects if an
>> NVMe namespace is removed after nvme_stop_queues() has been called
>> and before nvme_start_queues() is called. Moving the
>> blk_resume_queue(q) call into nvme_start_queues() will only work as
>> expected if no namespaces are added nor removed between the
>> nvme_stop_queues() and nvme_start_queues() calls. I'm not familiar
>> enough with the NVMe code to know whether or not this change is safe
>> ...
>
> It's something that looks obviously wrong, so explain why you need to
> do it, preferably in a comment above the function.

Hello James and Steve,

I will add a comment.

Please note that the above patch does not change the behavior of 
nvme_stop_queues() except that it causes nvme_stop_queues() to wait 
until any ongoing nvme_queue_rq() calls have finished. 
blk_resume_queue() does not affect the value of the BLK_MQ_S_STOPPED bit 
that has been set by blk_mq_stop_hw_queues(). All it does is to resume 
pending blk_queue_enter() calls and to ensure that future 
blk_queue_enter() calls do not block. Even after blk_resume_queue() has 
been called if a new request is queued queue_rq() won't be invoked 
because the BLK_MQ_S_STOPPED bit is still set. Patch "dm: Fix a race 
condition related to stopping and starting queues" realizes a similar 
change in the dm driver and that change has been tested extensively.

Bart.