[PATCH] nvme-pci: fix race between pci reset and nvme probe

Ming Lei ming.lei at redhat.com
Mon Aug 1 17:05:09 PDT 2022


On Mon, Aug 01, 2022 at 08:33:01AM -0600, Keith Busch wrote:
> On Mon, Aug 01, 2022 at 08:57:53PM +0800, Ming Lei wrote:
> > After nvme_probe() returns, device lock is released, and PCI reset
> > handler may come, meantime reset work is just scheduled and should
> > be in-progress.
> > 
> > When nvme_reset_prepare() is run, all NSs may not be allocated yet
> > and each NS's request queue won't be frozen by nvme_dev_disable().
> > 
> > But when nvme_reset_done() is called for resetting controller, all
> > NSs may have been scanned successfully, and nvme_wait_freeze() is
> > called on un-frozen request queues, then wait forever.
> > 
> > Fix the issue by holding device lock for resetting from nvme probe.
> > 
> > Reported-by: Yi Zhang <yi.zhang at redhat.com>
> > Link: https://lore.kernel.org/linux-block/CAHj4cs--KPTAGP=jj+7KMe=arDv=HeGeOgs1T8vbusyk=EjXow@mail.gmail.com/#r
> > Signed-off-by: Ming Lei <ming.lei at redhat.com>
> > ---
> >  drivers/nvme/host/pci.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 4232192e10dd..d49b1a082983 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -3075,9 +3075,14 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
> >  static void nvme_async_probe(void *data, async_cookie_t cookie)
> >  {
> >  	struct nvme_dev *dev = data;
> > +	struct pci_dev *pdev = to_pci_dev(dev->dev);
> >  
> > +	pci_dev_lock(pdev);
> > +	nvme_reset_ctrl(&dev->ctrl);
> >  	flush_work(&dev->ctrl.reset_work);
> >  	flush_work(&dev->ctrl.scan_work);
> > +	pci_dev_unlock(pdev);
> > +
> >  	nvme_put_ctrl(&dev->ctrl);
> >  }
> 
> When low on memory, async_schedule() falls back to calling the requested
> function directly, so this would deadlock on taking the pci_dev_lock() the
> second time within the probe context.

Good catch, maybe async_schedule() should provide one variation by
returning failure in case of OOM.

> 
> If it is successfully scheduled asynchronously, holding the lock blocks a hot
> removal, which might be the only thing that can unblock the nvme reset_work
> from forward progress.

OK, adding one reset or pci_reset lock may avoid to affect hot removal.

> 
> If you are encountering a nvme_reset_prepare() condition during scanning, that
> might indicate a failure to communicate with the end device. The scan work may
> need the error handling to unblock it.

Not sure I get your point. nvme_reset_prepare() is called from storing
to pci device's reset sysfs attr, so it can come any time, including
scanning.

Thanks,
Ming




More information about the Linux-nvme mailing list