[PATCH v1 0/1] nvme-pci: tear down controller on ERS permanent failure

Xixin Liu liuxixin at kylinos.cn
Tue Jun 9 19:55:00 PDT 2026


Hi,

This series fixes nvme-pci leaving the controller in NVME_CTRL_RESETTING
after PCIe ERS reports pci_channel_io_perm_failure.

On pci_channel_io_frozen the driver sets NVME_CTRL_RESETTING and quiesces
I/O, expecting slot_reset to restart the controller.  When pcie_do_recovery()
fails, the core reports perm_failure.  Reproduced on a QEMU 8.2.0 hotplug
NVMe when pci_bus_error_reset() fails: the Root Port LnkCap advertises
DLLLARC but LnkSta.DLLLA never sets within 100 ms after secondary bus
reset ("Data Link Layer Link Active not set in 100 msec"), slot_reset is
never invoked, and unpatched nvme-pci only logs and returns DISCONNECT.
sysfs state stays "resetting" and new I/O gets BLK_STS_RESOURCE — dd
and nvme list hang in uninterruptible D state.

Reuse the same teardown as the reset_work error path (DELETING, disable
with shutdown, mark namespaces dead, DEAD) so I/O fails immediately
instead of blocking.

Thanks,
Xixin Liu

---

Xixin Liu (1):
  nvme-pci: tear down controller on ERS permanent failure

 drivers/nvme/host/pci.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

-- 
2.43.0




More information about the Linux-nvme mailing list