[PATCH v3 1/1] nvme-pci : Fix EEH failure on ppc after subsystem reset

Fri Jun 21 09:37:50 PDT 2024

On Tue, Jun 04, 2024 at 02:40:04PM +0530, Nilay Shroff wrote:
> The NVMe subsystem reset command when executed may cause the loss of
> the NVMe adapter communication with kernel. And the only way today
> to recover the adapter is to either re-enumerate the pci bus or
> hotplug NVMe disk or reboot OS.
> 
> The PPC architecture supports mechanism called EEH (enhanced error
> handling) which allows pci bus errors to be cleared and a pci card to
> be rebooted, without having to physically hotplug NVMe disk or reboot
> the OS.
> 
> In the current implementation when user executes the nvme subsystem
> reset command and if kernel loses the communication with NVMe adapter
> then subsequent read/write to the PCIe config space of the device
> would fail. Failing to read/write to PCI config space makes NVMe
> driver assume the permanent loss of communication with the device and
> so driver marks the NVMe controller dead and frees all resources
> associate to that controller. As the NVMe controller goes dead, the
> EEH recovery can't succeed.
> 
> This patch helps fix this issue so that after user executes subsystem
> reset command if the communication with the NVMe adapter is lost and
> EEH recovery is initiated then we allow the EEH recovery to forward
> progress and gives the EEH thread a fair chance to recover the
> adapter. If in case, the EEH thread couldn't recover the adapter
> communication then it sets the pci channel state of the erring
> adapter to "permanent failure" and removes the device.

I think the driver is trying to do too much here by trying to handle the
subsystem reset inline with the reset request. It would surely fail, but
that was the idea because we had been expecting pciehp to re-enumerate.
But there are other possibilities, like your EEH, or others like DPC and
AER could do different handling instead of a bus rescan. So, perhaps the
problem is just the subsystem reset handling. Maybe just don't proceed
with the reset handling, and it'll be fine?

---

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 68b400f9c42d5..97ed33d9046d4 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -646,9 +646,12 @@ static inline void nvme_should_fail(struct request *req) {}
 
 bool nvme_wait_reset(struct nvme_ctrl *ctrl);
 int nvme_try_sched_reset(struct nvme_ctrl *ctrl);
+bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
+		enum nvme_ctrl_state new_state);
 
 static inline int nvme_reset_subsystem(struct nvme_ctrl *ctrl)
 {
+	u32 val;
 	int ret;
 
 	if (!ctrl->subsystem)
@@ -657,10 +660,10 @@ static inline int nvme_reset_subsystem(struct nvme_ctrl *ctrl)
 		return -EBUSY;
 
 	ret = ctrl->ops->reg_write32(ctrl, NVME_REG_NSSR, 0x4E564D65);
-	if (ret)
-		return ret;
-
-	return nvme_try_sched_reset(ctrl);
+	nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
+	if (!ret)
+		ctrl->ops->reg_read32(ctrl, NVME_REG_CSTS, &val);
+	return ret;
 }
 
 /*
@@ -786,8 +789,6 @@ blk_status_t nvme_host_path_error(struct request *req);
 bool nvme_cancel_request(struct request *req, void *data);
 void nvme_cancel_tagset(struct nvme_ctrl *ctrl);
 void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl);
-bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
-		enum nvme_ctrl_state new_state);
 int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown);
 int nvme_enable_ctrl(struct nvme_ctrl *ctrl);
 int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
--