[PATCH] nvme-pci: fix resume after AER recovery
Christoph Hellwig
hch at lst.de
Mon Jan 30 02:14:49 PST 2023
All I/O on a nvme controllers hangs after injecting a malformed TLP error
using aer-inject with an error file like:
--- snip ---
AER
PCI_ID WWWW:XX.YY.Z
UNCOR_STATUS COMP_TIME
HEADER_LOG 0 1 2 3
--- snip ---
This is because in this case the ->resume method will be called after
->error_injected and not ->slot_reset, leaving the controller in disabled
state and the queue frozen. Fix this by doing a controller reset to
resume as well.
Fixes: a0a3408ee614 ("NVMe: Add pci error handlers")
Reported-by: Maciej Grochowski <Maciej.Grochowski at sony.com>
Signed-off-by: Christoph Hellwig <hch at lst.de>
Tested-by: Maciej Grochowski <Maciej.Grochowski at sony.com>
---
drivers/nvme/host/pci.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c734934c407ccf..ec1e95d1a8c236 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3336,21 +3336,19 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
return PCI_ERS_RESULT_NEED_RESET;
}
-static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
+static void nvme_error_resume(struct pci_dev *pdev)
{
struct nvme_dev *dev = pci_get_drvdata(pdev);
dev_info(dev->ctrl.device, "restart after slot reset\n");
pci_restore_state(pdev);
nvme_reset_ctrl(&dev->ctrl);
- return PCI_ERS_RESULT_RECOVERED;
}
-static void nvme_error_resume(struct pci_dev *pdev)
+static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
{
- struct nvme_dev *dev = pci_get_drvdata(pdev);
-
- flush_work(&dev->ctrl.reset_work);
+ nvme_error_resume(pdev);
+ return PCI_ERS_RESULT_RECOVERED;
}
static const struct pci_error_handlers nvme_err_handler = {
--
2.39.0
More information about the Linux-nvme
mailing list