[PATCH] nvme-pci: Harden drive presence detect in nvme_dev_disable()

Max Gurtovoy mgurtovoy at nvidia.com
Thu May 12 06:03:39 PDT 2022


On 5/6/2022 1:15 PM, Stefan Roese wrote:
> On our ZynqMP system we observe, that a NVMe drive that resets itself
> while doing a firmware update causes a Kernel crash like this:
>
> [ 67.720772] pcieport 0000:02:02.0: pciehp: Slot(2): Link Down
> [ 67.720783] pcieport 0000:02:02.0: pciehp: Slot(2): Card not present
> [ 67.720795] nvme 0000:04:00.0: PME# disabled
> [ 67.720849] Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP
> [ 67.720853] nwl-pcie fd0e0000.pcie: Slave error
>
> Analysis: When nvme_dev_disable() is called because of this PCIe hotplug
> event, pci_is_enabled() is still true. And accessing the NVMe drive
> which is currently not available as it's in reboot process causes this
> "synchronous external abort" on this ARM64 platform.
>
> This patch adds the pci_device_is_present() check as well, which returns
> false in this "Card not present" hot-plug case. With this change, the
> NVMe driver does not try to access the NVMe registers any more and the
> FW update finishes without any problems.
>
> Signed-off-by: Stefan Roese <sr at denx.de>
> Cc: Keith Busch <kbusch at kernel.org>
> Cc: Jens Axboe <axboe at fb.com>
> Cc: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/pci.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 97afeb898b25..c764a2a1d7f1 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2473,7 +2473,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
>   	struct pci_dev *pdev = to_pci_dev(dev->dev);
>   
>   	mutex_lock(&dev->shutdown_lock);
> -	if (pci_is_enabled(pdev)) {
> +	if (pci_device_is_present(pdev) && pci_is_enabled(pdev)) {

sound weird to me that pci device can be enabled but not present.

Also why we get hot unplug event for FW upgrade ?

is this the correct behavior ?

>   		u32 csts = readl(dev->bar + NVME_REG_CSTS);
>   
>   		if (dev->ctrl.state == NVME_CTRL_LIVE ||



More information about the Linux-nvme mailing list