[PATCH 1/1] nvme-pci: fix hang during error recovery when the PCI device is isolated
Stefan Roese
sr at denx.de
Mon Jul 18 02:30:54 PDT 2022
On 12.07.22 16:17, Keith Busch wrote:
> On Tue, Jul 12, 2022 at 02:44:53PM +0200, Niklas Schnelle wrote:
>> On s390 and powerpc PCI devices are isolated when an error is detected
>> and driver->err_handler->error_detected is called with an inaccessible
>> PCI device and PCI channel state set to pci_channel_io_frozen
>> (see Step 1 in Documentation/PCI/pci-error-recovery.rst).
>>
>> In the case of NVMe devices nvme_error_detected() then calls
>> nvme_dev_disable(dev, false) and requests a reset. After a successful
>> reset the device is accessible again and nvme_slot_reset() resets the
>> controller and queues nvme_reset_work() which then recovers the
>> controller.
>>
>> Since commit b98235d3a471 ("nvme-pci: harden drive presence detect in
>> nvme_dev_disable()") however nvme_dev_disable() no longer freezes the
>> queues if pci_device_is_present() returns false. This is the case for an
>> isolated PCI device. In principle this makes sense as there are no
>> accessible hardware queues to run. The problem though is that for
>> a previously live reset controller with online queues nvme_reset_work()
>> calls nvme_wait_freeze() which, without the freeze having been
>> initiated, then hangs forever. Fix this by starting the freeze in
>> nvme_slot_reset() which is the earliest point where we know the device
>> should be accessible again.
>>
>> Fixes: b98235d3a471 ("nvme-pci: harden drive presence detect in nvme_dev_disable()")
>> Signed-off-by: Niklas Schnelle <schnelle at linux.ibm.com>
>
> Oh, we've messed up the expected sequence. The mistaken assumption is a device
> not present means we're about to unbind from it, but it could appear that way
> just for normal error handling and reset, so we need to preserve the previous
> handling.
>
> The offending commit really just wants to avoid the register access (which we
> shouldn't have to do, but hey, broken hardware...).
Correct.
> So let's keep the sequence
> the same as before and just skip the register read. Does this work for you?
>
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index fdfee3e590db..c40e82cee735 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> static void nvme_dev_remove_admin(struct nvme_dev *dev)
> @@ -2690,9 +2772,11 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
> struct pci_dev *pdev = to_pci_dev(dev->dev);
>
> mutex_lock(&dev->shutdown_lock);
> - if (pci_device_is_present(pdev) && pci_is_enabled(pdev)) {
> - u32 csts = readl(dev->bar + NVME_REG_CSTS);
> + if (pci_is_enabled(pdev)) {
> + u32 csts = ~0;
>
> + if (pci_device_is_present(pdev))
> + csts = readl(dev->bar + NVME_REG_CSTS);
> if (dev->ctrl.state == NVME_CTRL_LIVE ||
> dev->ctrl.state == NVME_CTRL_RESETTING) {
> freeze = true;
> --
Thanks. Looks good to me. So if anyone whats to send a proper patch with
this change, feel free to add my:
Reviewed-by: Stefan Roese <sr at denx.de>
Thanks,
Stefan
More information about the Linux-nvme
mailing list