[PATCH 5/5] nvme: Add two-pass shutdown support.
Jeremy Allison
jra at samba.org
Tue Jan 30 10:54:03 PST 2024
On Tue, Jan 30, 2024 at 01:15:43PM +0200, Sagi Grimberg wrote:
>
>>This works with the two-pass shutdown mechanism setup for the PCI
>>drivers and participates to provide the shutdown_wait
>>method at the pci_driver structure level.
>>
>>This patch changes the nvme shutdown() method to pass
>>down the NVME_PCI_DISABLE_SHUTDOWN_TWOPASS enum value instead
>>of NVME_PCI_DISABLE_SHUTDOWN. nvme_dev_disable() is changed
>>to call nvme_request_shutdown() instead of nvme_disable_ctrl()
>>in this case.
>>
>>nvme_request_shutdown() sets the shutdown bit,
>>but does not wait for completion.
>>
>>The nvme_shutdown_wait() callback is added to synchronously
>>wait for the NVME_CSTS_SHST_CMPLT bit meaning the nvme
>>device has shutdown.
>>
>>This change speeds up the shutdown in a system which hosts
>>many controllers.
>>
>>Based on work by Tanjore Suresh <tansuresh at google.com>
>>
>>Signed-off-by: Jeremy Allison <jallison at ciq.com>
>>---
>> drivers/nvme/host/pci.c | 39 ++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 36 insertions(+), 3 deletions(-)
>>
>>diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>index 8ee77f755d9d..bbd89eecf05a 100644
>>--- a/drivers/nvme/host/pci.c
>>+++ b/drivers/nvme/host/pci.c
>>@@ -2607,7 +2607,14 @@ static void nvme_dev_disable(struct nvme_dev *dev,
>> if (!dead && dev->ctrl.queue_count > 0) {
>> nvme_delete_io_queues(dev);
>>- nvme_disable_ctrl(&dev->ctrl, shutdown);
>>+ /*
>>+ * NVME_PCI_DISABLE_SHUTDOWN_TWOPASS requests shutdown
>>+ * but doesn't wait for completion.
>>+ */
>>+ if (shutdown_type == NVME_PCI_DISABLE_SHUTDOWN_TWOPASS)
>>+ nvme_request_shutdown(&dev->ctrl);
>>+ else
>>+ nvme_disable_ctrl(&dev->ctrl, shutdown);
>> nvme_poll_irqdisable(&dev->queues[0]);
>> }
>> nvme_suspend_io_queues(dev);
>>@@ -2625,7 +2632,7 @@ static void nvme_dev_disable(struct nvme_dev *dev,
>> * must flush all entered requests to their failed completion to avoid
>> * deadlocking blk-mq hot-cpu notifier.
>> */
>>- if (shutdown) {
>>+ if (shutdown_type == NVME_PCI_DISABLE_SHUTDOWN) {
>> nvme_unquiesce_io_queues(&dev->ctrl);
>> if (dev->ctrl.admin_q && !blk_queue_dying(dev->ctrl.admin_q))
>> nvme_unquiesce_admin_queue(&dev->ctrl);
>>@@ -3128,7 +3135,32 @@ static void nvme_shutdown(struct pci_dev *pdev)
>> {
>> struct nvme_dev *dev = pci_get_drvdata(pdev);
>>- nvme_disable_prepare_reset(dev, NVME_PCI_DISABLE_SHUTDOWN);
>>+ nvme_disable_prepare_reset(dev, NVME_PCI_DISABLE_SHUTDOWN_TWOPASS);
>>+}
>>+
>>+static void nvme_shutdown_wait(struct pci_dev *pdev)
>>+{
>>+ struct nvme_dev *dev = pci_get_drvdata(pdev);
>>+
>>+ mutex_lock(&dev->shutdown_lock);
>
>General question that just came up, is there any risk here that the
>shutdown_lock is released and re-taken later while the shutdown is
>still in progress? I don't spot anything immediate, but perhaps Keith
>can
>comment?
My gut feel (not knowing the kernel nearly as well as Samba
of course :-) is that this isn't a problem.
But that's because this two-stage code path of:
mutex_lock(&dev->shutdown_lock);
..
set device shutdown bit NVME_CC_SHN_NORMAL
..
mutex_unlock(&dev->shutdown_lock);
followed by:
mutex_lock(&dev->shutdown_lock);
..
wait for shutdown bit NVME_CSTS_SHST_CMPLT
..
mutex_unlock(&dev->shutdown_lock);
Is only invoked on the shutdown() method of
the nvme device. At this point the system is
going down and user-space processes are already
terminated.
Also the device has been removed from the
devices_kset list under a lock and the operations
are being done under a device_lock(dev).
So I can't see what other things could
go wrong with the two-stage approach.
Happy to learn if I'm wrong though :-).
>If this change does expose us to potential issues, you can simply
>release the shutdown_lock here (while taking it in ?
You appear to have accidently a word here :-).
More information about the Linux-nvme
mailing list