[PATCH v3 0/1] nvme-pci: recover from NVM subsystem reset

Tue Jun 4 02:10:03 PDT 2024

Hi Keith,

My previous attempt to get attention for this patch didn't garner enough 
eyeballs and so I thought to rewrite the text and tried including some 
more background on this. For those interested, I have also copied below 
the link to the previous email where we had some discussions about this 
patch.

The NVM subsystem reset command might be needed for activating nvme 
controller firmware image after the image is committed to a slot or 
in some cases to recover from the controller fatal error. The NVM 
subsystem reset when executed, may cause the loss of communication 
with NVMe controller. And the only way to re-establish communication 
with NVMe adapter is to either re-enumerate the pci bus or hotplug 
NVMe disk or reboot the OS. Fortunately, the PPC architecture supports
extended PCI capability which could help recover the loss of PCI adapter 
communication. The EEH (Enhanced Error Handling) hardware features on 
PPC machine allow PCI bus errors to be cleared and a PCI card to be 
"rebooted", without actually having to reboot the OS or re-enumerating 
PCI bus or hotplugging NVMe disk.

In the current implementation, when user executes NVM subsystem reset 
command, kernel programs the nvme subsystem register (NSSR) and then 
initiates the nvme reset work. The nvme reset work first shuts down the 
controller and that requires access to PCIe config space. As programming 
to NSSR typically causes the loss of communication with NVMe controller, 
the nvme reset work which is immediately followed after that would fail 
to read/write to PCIe config space and that causes the nvme driver to 
believe that controller is dead and so driver cleanup all resources 
associated with that NVMe controller and marks the controller dead. 
So the PCI error recovery (EEH on PPC) doesn't get chance to try recover 
device from the adapter communication lost. 

This patch helps to detect the case if the communication with the NVMe 
adapter is lost and the PCI error recovery has been initiated by the 
platform then allow error recovery to forward progress and thus contain 
the nvme reset work (which has been initiated post NVM subsystem reset)
from marking the controller dead. If in case pci error recovery is unable 
to recover the device then it sets the pci channel state to 
"permanent failure" and help removes the device.

I have tested the following cases with this patch applied,
1. NVM subsystem reset while no IO is running 
2. NVM subsystem reset while IO is ongoing
3. Inject PCI error while reset work is scheduled and no IO is running
4. Inject PCI error while reset work is scheduled and IO is ongoing

   For all above cases (1-4), verified that pci error recovery could 
   successfully recover the nvme disk.

5. NVM subsystem reset and then immediately hot remove the NVMe disk: 
   In this case though pci error recovery is initiated it couldn't forward 
   progress (as disk is hot removed) and so controller is deleted and it's 
   all associated resources are freed.

6. NVM subsystem reset and PCI error recovery is unable to recover the 
   device:
   In this case controller is deleted and it's all associated resources 
   are freed.

7. NVM subsystem reset on a platform which doesn't support PCI error  
   recovery:
   In this case nvme reset work frees resources associated with the 
   controller and mark it dead.

Changelog:
==========
Changes from v2:
  - Formatting cleanup 
  - Updated commit changelog to better describe the issue
  - Added the cover later to add more details about nvme 
    subsystem reset and error recovery(EEH)

Changes from v1:
  - Allow a controller to move from CONNECTING state to 
	RESETTING state (Keith)

  - Fix race condition between reset work and pci error handler 
    code which may contain reset work and pci recovery from 
    forward progress (Keith)

Link: https://lore.kernel.org/all/20240209050342.406184-1-nilay@linux.ibm.com/

Nilay Shroff (1):
  nvme-pci : Fix EEH failure on ppc after subsystem reset

 drivers/nvme/host/core.c |  1 +
 drivers/nvme/host/pci.c  | 20 +++++++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

-- 
2.45.1