[PATCH v1 1/1] nvme-pci: tear down controller on ERS permanent failure
Xixin Liu
liuxixin at kylinos.cn
Tue Jun 9 19:55:00 PDT 2026
On pci_channel_io_frozen nvme-pci moves the controller to NVME_CTRL_RESETTING
and quiesces I/O. If pcie_do_recovery() then fails and the core calls
error_detected(perm_failure), the driver only logged and returned DISCONNECT
without changing state. sysfs state remained "resetting";
nvme_fail_nonready_command() returned BLK_STS_RESOURCE and new I/O blocked
indefinitely.
Reproduced with aer_inject Uncorrectable Fatal on a QEMU hotplug NVMe when
pci_bus_error_reset() failed (LnkCap.DLLLARC set, LnkSta.DLLLA not set
within 100 ms). slot_reset was never called.
Add nvme_pci_ers_failed() with the same teardown sequence as reset_work
failure: DELETING, disable with shutdown, mark namespaces dead, then DEAD.
Signed-off-by: Xixin Liu <liuxixin at kylinos.cn>
---
drivers/nvme/host/pci.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cebe8c9e598c..6a9c9c654d0a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -4032,6 +4032,16 @@ static const struct dev_pm_ops nvme_dev_pm_ops = {
};
#endif /* CONFIG_PM_SLEEP */
+static void nvme_pci_ers_failed(struct nvme_dev *dev)
+{
+ nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
+ nvme_dev_disable(dev, true);
+ nvme_sync_queues(&dev->ctrl);
+ nvme_mark_namespaces_dead(&dev->ctrl);
+ nvme_unquiesce_io_queues(&dev->ctrl);
+ nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
+}
+
static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
pci_channel_state_t state)
{
@@ -4057,6 +4067,7 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
case pci_channel_io_perm_failure:
dev_warn(dev->ctrl.device,
"failure state error detected, request disconnect\n");
+ nvme_pci_ers_failed(dev);
return PCI_ERS_RESULT_DISCONNECT;
}
return PCI_ERS_RESULT_NEED_RESET;
--
2.43.0
More information about the Linux-nvme
mailing list