[PATCH] nvme-pci: try function level reset on init failure

Keith Busch kbusch at kernel.org
Tue Jul 15 06:30:00 PDT 2025


On Tue, Jul 15, 2025 at 09:45:58AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 14, 2025 at 10:13:28AM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch at kernel.org>
> > 
> > I've encountered various nvme devices that for whatever reason are stuck
> > in a reset state. Historically these have required a power cycle to make
> > them usable again. Vendors don't report any problem with the device when
> > we ship these for analysis.
> 
> Who is the "we" here?

Meta.
 
> > In many cases, a PCIe FLR is sufficient to restart operation without a
> > power cycle. Try it if controller reset fails the first time.
> 
> Why is that only done in the probe path and not the runtime reset path?

nvme_pci_configure_admin_queue() is called for both probe and
reset_work.

Is it because I wrote "fails the first time"? I mean the first reset for
each initialization attempt, whether it happens during probe or a later
reset. The code path will try an FLR on every single nvme reset if
CSTS.RDY doesn't clear as expected.

> > +	if (result < 0) {
> > +		struct pci_dev *pdev = to_pci_dev(dev->dev);
> > +
> > +		result = pcie_flr(pdev);
> > +		if (result < 0)
> > +			return result;
> > +		pci_restore_state(pdev);
> > +
> > +		result = nvme_disable_ctrl(&dev->ctrl, false);
> > +		if (result < 0)
> > +			return result;
> > +	}
> 
> Either way this warrants a big comment explaining what we are doing
> here.

Sure, no problem. I think also a dev_warn() if the 2nd disable_ctrl call
was successful to indicate an FLR was needed to get an expected
response.



More information about the Linux-nvme mailing list