[PATCH v3 0/1] nvme-pci: recover from NVM subsystem reset
Nilay Shroff
nilay at linux.ibm.com
Tue Jun 4 02:10:03 PDT 2024
Hi Keith,
My previous attempt to get attention for this patch didn't garner enough
eyeballs and so I thought to rewrite the text and tried including some
more background on this. For those interested, I have also copied below
the link to the previous email where we had some discussions about this
patch.
The NVM subsystem reset command might be needed for activating nvme
controller firmware image after the image is committed to a slot or
in some cases to recover from the controller fatal error. The NVM
subsystem reset when executed, may cause the loss of communication
with NVMe controller. And the only way to re-establish communication
with NVMe adapter is to either re-enumerate the pci bus or hotplug
NVMe disk or reboot the OS. Fortunately, the PPC architecture supports
extended PCI capability which could help recover the loss of PCI adapter
communication. The EEH (Enhanced Error Handling) hardware features on
PPC machine allow PCI bus errors to be cleared and a PCI card to be
"rebooted", without actually having to reboot the OS or re-enumerating
PCI bus or hotplugging NVMe disk.
In the current implementation, when user executes NVM subsystem reset
command, kernel programs the nvme subsystem register (NSSR) and then
initiates the nvme reset work. The nvme reset work first shuts down the
controller and that requires access to PCIe config space. As programming
to NSSR typically causes the loss of communication with NVMe controller,
the nvme reset work which is immediately followed after that would fail
to read/write to PCIe config space and that causes the nvme driver to
believe that controller is dead and so driver cleanup all resources
associated with that NVMe controller and marks the controller dead.
So the PCI error recovery (EEH on PPC) doesn't get chance to try recover
device from the adapter communication lost.
This patch helps to detect the case if the communication with the NVMe
adapter is lost and the PCI error recovery has been initiated by the
platform then allow error recovery to forward progress and thus contain
the nvme reset work (which has been initiated post NVM subsystem reset)
from marking the controller dead. If in case pci error recovery is unable
to recover the device then it sets the pci channel state to
"permanent failure" and help removes the device.
I have tested the following cases with this patch applied,
1. NVM subsystem reset while no IO is running
2. NVM subsystem reset while IO is ongoing
3. Inject PCI error while reset work is scheduled and no IO is running
4. Inject PCI error while reset work is scheduled and IO is ongoing
For all above cases (1-4), verified that pci error recovery could
successfully recover the nvme disk.
5. NVM subsystem reset and then immediately hot remove the NVMe disk:
In this case though pci error recovery is initiated it couldn't forward
progress (as disk is hot removed) and so controller is deleted and it's
all associated resources are freed.
6. NVM subsystem reset and PCI error recovery is unable to recover the
device:
In this case controller is deleted and it's all associated resources
are freed.
7. NVM subsystem reset on a platform which doesn't support PCI error
recovery:
In this case nvme reset work frees resources associated with the
controller and mark it dead.
Changelog:
==========
Changes from v2:
- Formatting cleanup
- Updated commit changelog to better describe the issue
- Added the cover later to add more details about nvme
subsystem reset and error recovery(EEH)
Changes from v1:
- Allow a controller to move from CONNECTING state to
RESETTING state (Keith)
- Fix race condition between reset work and pci error handler
code which may contain reset work and pci recovery from
forward progress (Keith)
Link: https://lore.kernel.org/all/20240209050342.406184-1-nilay@linux.ibm.com/
Nilay Shroff (1):
nvme-pci : Fix EEH failure on ppc after subsystem reset
drivers/nvme/host/core.c | 1 +
drivers/nvme/host/pci.c | 20 +++++++++++++++++---
2 files changed, 18 insertions(+), 3 deletions(-)
--
2.45.1
More information about the Linux-nvme
mailing list