[PATCH v1 0/1] nvme-pci: tear down controller on ERS permanent failure
Xixin Liu
liuxixin at kylinos.cn
Tue Jun 9 19:55:00 PDT 2026
Hi,
This series fixes nvme-pci leaving the controller in NVME_CTRL_RESETTING
after PCIe ERS reports pci_channel_io_perm_failure.
On pci_channel_io_frozen the driver sets NVME_CTRL_RESETTING and quiesces
I/O, expecting slot_reset to restart the controller. When pcie_do_recovery()
fails, the core reports perm_failure. Reproduced on a QEMU 8.2.0 hotplug
NVMe when pci_bus_error_reset() fails: the Root Port LnkCap advertises
DLLLARC but LnkSta.DLLLA never sets within 100 ms after secondary bus
reset ("Data Link Layer Link Active not set in 100 msec"), slot_reset is
never invoked, and unpatched nvme-pci only logs and returns DISCONNECT.
sysfs state stays "resetting" and new I/O gets BLK_STS_RESOURCE — dd
and nvme list hang in uninterruptible D state.
Reuse the same teardown as the reset_work error path (DELETING, disable
with shutdown, mark namespaces dead, DEAD) so I/O fails immediately
instead of blocking.
Thanks,
Xixin Liu
---
Xixin Liu (1):
nvme-pci: tear down controller on ERS permanent failure
drivers/nvme/host/pci.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--
2.43.0
More information about the Linux-nvme
mailing list