[PATCHv2] nvme-pci: Fix EEH failure on ppc after subsystem reset

Mon Apr 8 03:26:29 PDT 2024

If the nvme subsyetm reset causes the loss of communication to the nvme
adapter then EEH could potnetially recover the adapter. The detection of
comminication loss to the adapter only happens when the nvme driver
attempts to read an MMIO register.

The nvme subsystem reset command writes 0x4E564D65 to NSSR register and
schedule adapter reset.In the case nvme subsystem reset caused the loss
of communication to the nvme adapter then either IO timeout event or
adapter reset handler could detect it. If IO timeout event could detect
loss of communication then EEH handler is able to recover the communication
to the adapter. This change was implemented in commit 651438bb0af5
("nvme-pci: Fix EEH failure on ppc"). However if the adapter communication
loss is detected during nvme reset work then EEH is unable to successfully
finish the adapter recovery.

This patch ensures that,
- nvme reset work can observer pci channel is offline (at-least on the
  paltfrom which supports EEH recovery) after a failed MMIO read and
  contains reset work forward progress and marking controller state to
  DEAD. Thus we give a fair chance to EEH handler to recover the nvme
  adapter.

- if pci channel "frozen" error is detected while controller is already
  in the RESETTING state then don't try (re-)setting controller state to
  RESETTING which would otherwise obviously fail and we may prematurely
  breaks out of the EEH recovery handling.

- if pci channel "frozen" error is detected while reset work is in progress
  then wait until reset work is finished before proceeding with nvme dev
  disable. This would ensure that the reset work doesn't race with the
  pci error handler code and both error handler and reset work forward
  progress without blocking.

Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
---
Changes from v1:
  - Allow a controller to reset from a connecting state (Keith)

  - Fix race condition between reset work and pci error handler 
    code which may contain reset work and EEH recovery from 
    forward progress (Keith)

 drivers/nvme/host/core.c |  1 +
 drivers/nvme/host/pci.c  | 19 ++++++++++++++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 27281a9a8951..b3fe1a02c171 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -557,6 +557,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		switch (old_state) {
 		case NVME_CTRL_NEW:
 		case NVME_CTRL_LIVE:
+		case NVME_CTRL_CONNECTING:
 			changed = true;
 			fallthrough;
 		default:
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8e0bb9692685..553bf0ec5f5c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2776,6 +2776,16 @@ static void nvme_reset_work(struct work_struct *work)
  out_unlock:
 	mutex_unlock(&dev->shutdown_lock);
  out:
+	/*
+	 * If PCI recovery is ongoing then let it finish first
+	 */
+	if (pci_channel_offline(to_pci_dev(dev->dev))) {
+		if (nvme_ctrl_state(&dev->ctrl) == NVME_CTRL_RESETTING ||
+			nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
+			dev_warn(dev->ctrl.device, "Let pci error recovery finish!\n");
+			return;
+		}
+	}
 	/*
 	 * Set state to deleting now to avoid blocking nvme_wait_reset(), which
 	 * may be holding this pci_dev's device lock.
@@ -3295,10 +3305,13 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 	case pci_channel_io_frozen:
 		dev_warn(dev->ctrl.device,
 			"frozen state error detected, reset controller\n");
-		if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
-			nvme_dev_disable(dev, true);
-			return PCI_ERS_RESULT_DISCONNECT;
+		if (nvme_ctrl_state(&dev->ctrl) != NVME_CTRL_RESETTING) {
+			if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
+				nvme_dev_disable(dev, true);
+				return PCI_ERS_RESULT_DISCONNECT;
+			}
 		}
+		flush_work(&dev->ctrl.reset_work);
 		nvme_dev_disable(dev, false);
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
-- 
2.44.0