[PATCH] nvme-pci: fix stuck reset on concurrent DPC and HP

Fri Mar 7 04:58:28 PST 2025

On 3/7/25 5:54 AM, Keith Busch wrote:
> From: Keith Busch <kbusch at kernel.org>
> 
> The PCIe DPC handling has the nvme driver quiesce the device, attempt to
> restart it, then wait for that restart to complete.
> 
> The DPC event also toggles the PCIe link. If the slot doesn't have
> out-of-band presence detection, this will trigger a pciehp
> re-enumeration.
> 
> The DPC's error handling that calls nvme_error_resume is holding the
> device lock while this happens. This lock prevents pciehp's request to
> disconnect the driver from proceeding.
> 
> Meanwhile the nvme's reset can't make forward progress because its
> device isn't there anymore withoutstanding IO, and the timeout handler
> won't do anything to fix it because the device is undergoing error
> handling.
> 
> End result: deadlocked.
> 
> Fix this by having the timeout handler short cut the disabling for a
> disconnected PCIe device. The downside is that we're relying on an IO
> timeout to clean up this mess, which could be a minute by default.
> 
> Signed-off-by: Keith Busch <kbusch at kernel.org>
> ---
>  drivers/nvme/host/pci.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 640590b217282..5963a5f6da940 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1412,16 +1412,17 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
>  	struct request *abort_req;
>  	struct nvme_command cmd = { };
>  	u32 csts = readl(dev->bar + NVME_REG_CSTS);
> +	struct pci_dev *pdev = to_pci_dev(dev->dev);
>  	u8 opcode;
>  
> -	if (nvme_state_terminal(&dev->ctrl))
> +	if (nvme_state_terminal(&dev->ctrl) || pci_dev_is_disconnected(pdev))
>  		goto disable;
>  
>  	/* If PCI error recovery process is happening, we cannot reset or
>  	 * the recovery mechanism will surely fail.
>  	 */
>  	mb();
> -	if (pci_channel_offline(to_pci_dev(dev->dev)))
> +	if (pci_channel_offline(pdev))
>  		return BLK_EH_RESET_TIMER;
>  
>  	/*
> @@ -1522,9 +1523,12 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
>  
>  disable:
>  	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
> -		if (nvme_state_terminal(&dev->ctrl))
> +		if (nvme_state_terminal(&dev->ctrl) ||
> +		    pci_dev_is_disconnected(pdev)) {
>  			nvme_dev_disable(dev, true);
> -		return BLK_EH_DONE;
> +			return BLK_EH_DONE;
> +		}
> +		return BLK_EH_RESET_TIMER;
>  	}
>  
>  	nvme_dev_disable(dev, false);

This looks good. I have also tested this patch on PPC however as PPC doesn't support 
AER/DPC, I tested it with concurrent  errinjct/EEH tool and hotplug while IO is running.

Though one question: IMO, the DPC error handler shall invoke nvme_error_detected() prior
to nvme_error_resume(). And we already disable the device (and cancel in-flight IO) in 
nvme_error_detected() and so wouldn't that help? 

Thanks,
--Nilay