[PATCH 5/5] nvme: Add two-pass shutdown support.

Jeremy Allison jra at samba.org
Tue Jan 30 10:54:03 PST 2024


On Tue, Jan 30, 2024 at 01:15:43PM +0200, Sagi Grimberg wrote:
>
>>This works with the two-pass shutdown mechanism setup for the PCI
>>drivers and participates to provide the shutdown_wait
>>method at the pci_driver structure level.
>>
>>This patch changes the nvme shutdown() method to pass
>>down the NVME_PCI_DISABLE_SHUTDOWN_TWOPASS enum value instead
>>of NVME_PCI_DISABLE_SHUTDOWN. nvme_dev_disable() is changed
>>to call nvme_request_shutdown() instead of nvme_disable_ctrl()
>>in this case.
>>
>>nvme_request_shutdown() sets the shutdown bit,
>>but does not wait for completion.
>>
>>The nvme_shutdown_wait() callback is added to synchronously
>>wait for the NVME_CSTS_SHST_CMPLT bit meaning the nvme
>>device has shutdown.
>>
>>This change speeds up the shutdown in a system which hosts
>>many controllers.
>>
>>Based on work by Tanjore Suresh <tansuresh at google.com>
>>
>>Signed-off-by: Jeremy Allison <jallison at ciq.com>
>>---
>>  drivers/nvme/host/pci.c | 39 ++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 36 insertions(+), 3 deletions(-)
>>
>>diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>index 8ee77f755d9d..bbd89eecf05a 100644
>>--- a/drivers/nvme/host/pci.c
>>+++ b/drivers/nvme/host/pci.c
>>@@ -2607,7 +2607,14 @@ static void nvme_dev_disable(struct nvme_dev *dev,
>>  	if (!dead && dev->ctrl.queue_count > 0) {
>>  		nvme_delete_io_queues(dev);
>>-		nvme_disable_ctrl(&dev->ctrl, shutdown);
>>+		/*
>>+		 * NVME_PCI_DISABLE_SHUTDOWN_TWOPASS requests shutdown
>>+		 * but doesn't wait for completion.
>>+		 */
>>+		if (shutdown_type == NVME_PCI_DISABLE_SHUTDOWN_TWOPASS)
>>+			nvme_request_shutdown(&dev->ctrl);
>>+		else
>>+			nvme_disable_ctrl(&dev->ctrl, shutdown);
>>  		nvme_poll_irqdisable(&dev->queues[0]);
>>  	}
>>  	nvme_suspend_io_queues(dev);
>>@@ -2625,7 +2632,7 @@ static void nvme_dev_disable(struct nvme_dev *dev,
>>  	 * must flush all entered requests to their failed completion to avoid
>>  	 * deadlocking blk-mq hot-cpu notifier.
>>  	 */
>>-	if (shutdown) {
>>+	if (shutdown_type == NVME_PCI_DISABLE_SHUTDOWN) {
>>  		nvme_unquiesce_io_queues(&dev->ctrl);
>>  		if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q))
>>  			nvme_unquiesce_admin_queue(&dev->ctrl);
>>@@ -3128,7 +3135,32 @@ static void nvme_shutdown(struct pci_dev *pdev)
>>  {
>>  	struct nvme_dev *dev = pci_get_drvdata(pdev);
>>-	nvme_disable_prepare_reset(dev, NVME_PCI_DISABLE_SHUTDOWN);
>>+	nvme_disable_prepare_reset(dev, NVME_PCI_DISABLE_SHUTDOWN_TWOPASS);
>>+}
>>+
>>+static void nvme_shutdown_wait(struct pci_dev *pdev)
>>+{
>>+	struct nvme_dev *dev = pci_get_drvdata(pdev);
>>+
>>+	mutex_lock(&dev->shutdown_lock);
>
>General question that just came up, is there any risk here that the
>shutdown_lock is released and re-taken later while the shutdown is 
>still in progress? I don't spot anything immediate, but perhaps Keith 
>can
>comment?

My gut feel (not knowing the kernel nearly as well as Samba
of course :-) is that this isn't a problem.

But that's because this two-stage code path of:

mutex_lock(&dev->shutdown_lock);
..
	set device shutdown bit NVME_CC_SHN_NORMAL
..
mutex_unlock(&dev->shutdown_lock);

followed by:

mutex_lock(&dev->shutdown_lock);
..
	wait for shutdown bit NVME_CSTS_SHST_CMPLT
..
mutex_unlock(&dev->shutdown_lock);

Is only invoked on the shutdown() method of
the nvme device. At this point the system is
going down and user-space processes are already
terminated.

Also the device has been removed from the
devices_kset list under a lock and the operations
are being done under a device_lock(dev).

So I can't see what other things could
go wrong with the two-stage approach.

Happy to learn if I'm wrong though :-).

>If this change does expose us to potential issues, you can simply
>release the shutdown_lock here (while taking it in ?

You appear to have accidently a word here :-).



More information about the Linux-nvme mailing list