[PATCH] nvme-pci: try function level reset on init failure
Keith Busch
kbusch at kernel.org
Tue Jul 15 06:30:00 PDT 2025
On Tue, Jul 15, 2025 at 09:45:58AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 14, 2025 at 10:13:28AM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch at kernel.org>
> >
> > I've encountered various nvme devices that for whatever reason are stuck
> > in a reset state. Historically these have required a power cycle to make
> > them usable again. Vendors don't report any problem with the device when
> > we ship these for analysis.
>
> Who is the "we" here?
Meta.
> > In many cases, a PCIe FLR is sufficient to restart operation without a
> > power cycle. Try it if controller reset fails the first time.
>
> Why is that only done in the probe path and not the runtime reset path?
nvme_pci_configure_admin_queue() is called for both probe and
reset_work.
Is it because I wrote "fails the first time"? I mean the first reset for
each initialization attempt, whether it happens during probe or a later
reset. The code path will try an FLR on every single nvme reset if
CSTS.RDY doesn't clear as expected.
> > + if (result < 0) {
> > + struct pci_dev *pdev = to_pci_dev(dev->dev);
> > +
> > + result = pcie_flr(pdev);
> > + if (result < 0)
> > + return result;
> > + pci_restore_state(pdev);
> > +
> > + result = nvme_disable_ctrl(&dev->ctrl, false);
> > + if (result < 0)
> > + return result;
> > + }
>
> Either way this warrants a big comment explaining what we are doing
> here.
Sure, no problem. I think also a dev_warn() if the 2nd disable_ctrl call
was successful to indicate an FLR was needed to get an expected
response.
More information about the Linux-nvme
mailing list