[PATCH v3 1/1] nvme-pci : Fix EEH failure on ppc after subsystem reset

Tue Jun 4 02:10:04 PDT 2024

The NVMe subsystem reset command when executed may cause the loss of
the NVMe adapter communication with kernel. And the only way today
to recover the adapter is to either re-enumerate the pci bus or
hotplug NVMe disk or reboot OS.

The PPC architecture supports mechanism called EEH (enhanced error
handling) which allows pci bus errors to be cleared and a pci card to
be rebooted, without having to physically hotplug NVMe disk or reboot
the OS.

In the current implementation when user executes the nvme subsystem
reset command and if kernel loses the communication with NVMe adapter
then subsequent read/write to the PCIe config space of the device
would fail. Failing to read/write to PCI config space makes NVMe
driver assume the permanent loss of communication with the device and
so driver marks the NVMe controller dead and frees all resources
associate to that controller. As the NVMe controller goes dead, the
EEH recovery can't succeed.

This patch helps fix this issue so that after user executes subsystem
reset command if the communication with the NVMe adapter is lost and
EEH recovery is initiated then we allow the EEH recovery to forward
progress and gives the EEH thread a fair chance to recover the
adapter. If in case, the EEH thread couldn't recover the adapter
communication then it sets the pci channel state of the erring
adapter to "permanent failure" and removes the device.

Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
---
 drivers/nvme/host/core.c |  1 +
 drivers/nvme/host/pci.c  | 21 ++++++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f5d150c62955..afb8419566a9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -562,6 +562,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		switch (old_state) {
 		case NVME_CTRL_NEW:
 		case NVME_CTRL_LIVE:
+		case NVME_CTRL_CONNECTING:
 			changed = true;
 			fallthrough;
 		default:
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 102a9fb0c65f..f1bb8df20701 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2789,6 +2789,17 @@ static void nvme_reset_work(struct work_struct *work)
  out_unlock:
 	mutex_unlock(&dev->shutdown_lock);
  out:
+	/*
+	 * If PCI recovery is ongoing then let it finish first
+	 */
+	if (pci_channel_offline(to_pci_dev(dev->dev))) {
+		if (nvme_ctrl_state(&dev->ctrl) == NVME_CTRL_RESETTING ||
+		    nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
+			dev_warn(dev->ctrl.device,
+				"Let pci error recovery finish!\n");
+			return;
+		}
+	}
 	/*
 	 * Set state to deleting now to avoid blocking nvme_wait_reset(), which
 	 * may be holding this pci_dev's device lock.
@@ -3308,10 +3319,14 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 	case pci_channel_io_frozen:
 		dev_warn(dev->ctrl.device,
 			"frozen state error detected, reset controller\n");
-		if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
-			nvme_dev_disable(dev, true);
-			return PCI_ERS_RESULT_DISCONNECT;
+		if (nvme_ctrl_state(&dev->ctrl) != NVME_CTRL_RESETTING) {
+			if (!nvme_change_ctrl_state(&dev->ctrl,
+					NVME_CTRL_RESETTING)) {
+				nvme_dev_disable(dev, true);
+				return PCI_ERS_RESULT_DISCONNECT;
+			}
 		}
+		flush_work(&dev->ctrl.reset_work);
 		nvme_dev_disable(dev, false);
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
-- 
2.45.1