[PATCH 04/17] nvme: don't call nvme_kill_queues from nvme_remove_namespaces

Sagi Grimberg sagi at grimberg.me
Tue Oct 25 13:17:04 PDT 2022



On 10/25/22 20:43, Keith Busch wrote:
> On Tue, Oct 25, 2022 at 07:40:07AM -0700, Christoph Hellwig wrote:
>> @@ -4560,15 +4560,6 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>>   	/* prevent racing with ns scanning */
>>   	flush_work(&ctrl->scan_work);
>>   
>> -	/*
>> -	 * The dead states indicates the controller was not gracefully
>> -	 * disconnected. In that case, we won't be able to flush any data while
>> -	 * removing the namespaces' disks; fail all the queues now to avoid
>> -	 * potentially having to clean up the failed sync later.
>> -	 */
>> -	if (ctrl->state == NVME_CTRL_DEAD)
>> -		nvme_kill_queues(ctrl);
>> -
>>   	/* this is a no-op when called from the controller reset handler */
>>   	nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO);
>>   
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index ec034d4dd9eff..f971e96ffd3f6 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -3249,6 +3249,16 @@ static void nvme_remove(struct pci_dev *pdev)
>>   
>>   	flush_work(&dev->ctrl.reset_work);
>>   	nvme_stop_ctrl(&dev->ctrl);
>> +
>> +	/*
>> +	 * The dead states indicates the controller was not gracefully
>> +	 * disconnected. In that case, we won't be able to flush any data while
>> +	 * removing the namespaces' disks; fail all the queues now to avoid
>> +	 * potentially having to clean up the failed sync later.
>> +	 */
>> +	if (dev->ctrl.state == NVME_CTRL_DEAD)
>> +		nvme_kill_queues(&dev->ctrl);
>> +
>>   	nvme_remove_namespaces(&dev->ctrl);
>>   	nvme_dev_disable(dev, true);
>>   	nvme_remove_attrs(dev);
>> -- 
>> 2.30.2
>>
> 
> We still need the flush_work(scan_work) prior to killing the queues. It
> looks like it could safely be moved to nvme_stop_ctrl(), which might
> make it easier on everyone if it were there.

If we do end up moving it to nvme_stop_ctrl, can we make a sub-version
of nvme_stop_ctrl that cannot block on I/O (i.e. without ana/scan/auth)?
for multipathing where we want to teardown the controller quickly so we
can failover I/O asap.

IIRC this is why scan_work is not in nvme_stop_ctrl to begin with, but
it is also possible that there was some other deadlock caused by that.



More information about the Linux-nvme mailing list