[PATCHv2] nvme-pci: try function level reset on init failure
Keith Busch
kbusch at meta.com
Tue Jul 15 12:16:27 PDT 2025
From: Keith Busch <kbusch at kernel.org>
NVMe devices from multiple vendors appear to get stuck in a reset state
that we can't get out of with an NVMe level Controller Reset. The kernel
would report these with messages that look like:
Device not ready; aborting reset, CSTS=0x1
These have historically required a power cycle to make them usable
again, but in many cases, a PCIe FLR is sufficient to restart operation
without a power cycle. Try it if the initial controller reset fails
during any nvme reset attempt.
Cc: Chaitanya Kulkarni <chaitanyak at nvidia.com>
Signed-off-by: Keith Busch <kbusch at kernel.org>
---
v1->v2:
Added code comment explaining whe escalation
Add an informational kernel message that this event occured
Use the "pcie_reset_flr()" API instead of "pcie_flr()" since that one
checks for quirks and capabilities before writing FLR config bits.
Note, NVMe PCI Trasnsport Spec mandates FLR capability, so the latter
should not apply to any compliant device, but you never know...
drivers/nvme/host/pci.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4cf87fb5d8573..f8f8cb6a4786a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2067,8 +2067,28 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
* might be pointing at!
*/
result = nvme_disable_ctrl(&dev->ctrl, false);
- if (result < 0)
- return result;
+ if (result < 0) {
+ struct pci_dev *pdev = to_pci_dev(dev->dev);
+
+ /*
+ * The NVMe Controller Reset method did not get an expected
+ * CSTS.RDY transition, so something with the device appears to
+ * be stuck. Use the lower level and bigger hammer PCIe
+ * Function Level Reset to attempt restoring the device to its
+ * initial state, and try again.
+ */
+ result = pcie_reset_flr(pdev, false);
+ if (result < 0)
+ return result;
+
+ pci_restore_state(pdev);
+ result = nvme_disable_ctrl(&dev->ctrl, false);
+ if (result < 0)
+ return result;
+
+ dev_info(&dev->ctrl.device,
+ "controller reset completed after pcie flr\n");
+ }
result = nvme_alloc_queue(dev, 0, NVME_AQ_DEPTH);
if (result)
--
2.47.1
More information about the Linux-nvme
mailing list