[PATCH] nvme-pci: fix stuck reset on concurrent DPC and HP
Nilay Shroff
nilay at linux.ibm.com
Fri Mar 7 04:58:28 PST 2025
On 3/7/25 5:54 AM, Keith Busch wrote:
> From: Keith Busch <kbusch at kernel.org>
>
> The PCIe DPC handling has the nvme driver quiesce the device, attempt to
> restart it, then wait for that restart to complete.
>
> The DPC event also toggles the PCIe link. If the slot doesn't have
> out-of-band presence detection, this will trigger a pciehp
> re-enumeration.
>
> The DPC's error handling that calls nvme_error_resume is holding the
> device lock while this happens. This lock prevents pciehp's request to
> disconnect the driver from proceeding.
>
> Meanwhile the nvme's reset can't make forward progress because its
> device isn't there anymore withoutstanding IO, and the timeout handler
> won't do anything to fix it because the device is undergoing error
> handling.
>
> End result: deadlocked.
>
> Fix this by having the timeout handler short cut the disabling for a
> disconnected PCIe device. The downside is that we're relying on an IO
> timeout to clean up this mess, which could be a minute by default.
>
> Signed-off-by: Keith Busch <kbusch at kernel.org>
> ---
> drivers/nvme/host/pci.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 640590b217282..5963a5f6da940 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1412,16 +1412,17 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
> struct request *abort_req;
> struct nvme_command cmd = { };
> u32 csts = readl(dev->bar + NVME_REG_CSTS);
> + struct pci_dev *pdev = to_pci_dev(dev->dev);
> u8 opcode;
>
> - if (nvme_state_terminal(&dev->ctrl))
> + if (nvme_state_terminal(&dev->ctrl) || pci_dev_is_disconnected(pdev))
> goto disable;
>
> /* If PCI error recovery process is happening, we cannot reset or
> * the recovery mechanism will surely fail.
> */
> mb();
> - if (pci_channel_offline(to_pci_dev(dev->dev)))
> + if (pci_channel_offline(pdev))
> return BLK_EH_RESET_TIMER;
>
> /*
> @@ -1522,9 +1523,12 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
>
> disable:
> if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
> - if (nvme_state_terminal(&dev->ctrl))
> + if (nvme_state_terminal(&dev->ctrl) ||
> + pci_dev_is_disconnected(pdev)) {
> nvme_dev_disable(dev, true);
> - return BLK_EH_DONE;
> + return BLK_EH_DONE;
> + }
> + return BLK_EH_RESET_TIMER;
> }
>
> nvme_dev_disable(dev, false);
This looks good. I have also tested this patch on PPC however as PPC doesn't support
AER/DPC, I tested it with concurrent errinjct/EEH tool and hotplug while IO is running.
Though one question: IMO, the DPC error handler shall invoke nvme_error_detected() prior
to nvme_error_resume(). And we already disable the device (and cancel in-flight IO) in
nvme_error_detected() and so wouldn't that help?
Thanks,
--Nilay
More information about the Linux-nvme
mailing list