[BUG][6.2.11] WD SN770 nvme controller is down

Keith Busch kbusch at kernel.org
Fri Apr 28 09:26:28 PDT 2023


On Fri, Apr 21, 2023 at 04:21:36PM -0600, Lyndon Sanche wrote:
> 
> I am wondering if there are any other steps I can take to troubleshoot this
> problem. I have tried taking the drive back to the store I bought it from,
> but the tests they ran all passed so a replacement/return is not likely to
> be possible. I am wondering if there are other settings or possibly patches
> I could try.
> 
> It should be noted that none of these problems occur on my desktop with the
> same model of drive (ASUS X570 Motherboard).

I do not know what causes this, but here's a shot in the dark at
recovery. Since you showed a valid PCI_STATUS, the link is still
viable, so maybe an FLR will get the device MMIO capable again?

If this helps at all (it may very well accomplish nothing), this
is still not a satisfying conclusion since we really want to avoid
the situation in the first place rather than recover from it.

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 26e17110a419e..1278b8fd234c0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1272,6 +1272,11 @@ static void nvme_warn_reset(struct nvme_dev *dev, u32 csts)
 	if (csts != ~0)
 		return;
 
+	if (pci_status != (u16)~0U) {
+		pci_try_reset_function(to_pci_dev(dev->dev));
+		return;
+	}
+
 	dev_warn(dev->ctrl.device,
 		 "Does your device have a faulty power saving mode enabled?\n");
 	dev_warn(dev->ctrl.device,
--



More information about the Linux-nvme mailing list