[PATCH v2 0/1] nvme: Fix problem when booting from NVMe drive was leading to a hang.

Michael Kropaczek michael.kropaczek at solidigm.com
Mon Mar 4 10:25:07 PST 2024


Description:

During endurance test, when a system was rebooted from NVMe drive, boot
process hung occasionally. The number of reboot cycles was set to 1000,
with interval of 120s. Hang occurred after ~300 reboot cycles.
After investigating the cause, it was established that NVMe driver
did not disable host memory during shutdown leaving NVMe controller
in a state preventing proper initialization in BIOS pre-boot stage.
Adding of the call to nvme_set_host_mem(dev, 0) when in shutdown
fixed the issue.

Michael Kropaczek (1):
  nvme: Fix problem when booting from NVMe drive was leading to a hang.

 drivers/nvme/host/pci.c | 8 ++++++++
 1 file changed, 8 insertions(+)


base-commit: 8d30528a170905ede9ab6ab81f229e441808590b
-- 
2.34.1

>From 9eec234181015af624d8e5cd8670ba5d82d0ce7e Mon Sep 17 00:00:00 2001
From: Michael Kropaczek <michael.kropaczek at solidigm.com>
Date: Thu, 29 Feb 2024 15:33:27 -0800
Subject: [PATCH v2 1/1] nvme: Fix problem when booting from NVMe drive was
 leading to a hang.
To: linux-nvme at lists.infradead.org
Cc: Keith Busch <kbusch at kernel.org>,
    Jens Axboe <axboe at fb.com>,
    Christoph Hellwig <hch at lst.de>,
    Sagi Grimberg <sagi at grimberg.me>,
    Michael Kropaczek <michael.kropaczek at solidigm.com>

On certain host architectures/HW, DRAM was keeping memory contents over reboot
cycles. Certain NVMe controllers were accessing host memory after startup which
led to undefined state, preventing proper initialization in BIOS boot stage.
Freeing host memory during host's shutdown prevents the problem from occurring.

Signed-off-by: Michael Kropaczek <michael.kropaczek at solidigm.com>
---
 drivers/nvme/host/pci.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e6267a6aa380..e5292c7b301f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2593,6 +2593,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
 	}
 
+	/*
+	 * On certain host architectures/HW, DRAM was keeping memory contents over reboot-cycles.
+	 * It was observed that certain controllers were accessing host memory after
+	 * resetting which led to undefined state preventing proper initialization.
+	 */
+	if (shutdown && dev->hmb)
+		nvme_set_host_mem(dev, 0);
+
 	nvme_quiesce_io_queues(&dev->ctrl);
 
 	if (!dead && dev->ctrl.queue_count > 0) {
-- 
2.34.1




More information about the Linux-nvme mailing list