[PATCH] nvme: remove pci device if no longer present

Keith Busch keith.busch at intel.com
Wed Jul 5 09:05:31 PDT 2017


[correcting linux-nvme in the CC]

On Wed, Jul 05, 2017 at 12:03:35PM -0400, Keith Busch wrote:
> On Sun, Jul 02, 2017 at 08:31:51AM -0700, Christoph Hellwig wrote:
> > Please CC the linux-nvme list on any nvme issues.  Also this
> > code is getting a little too fancy for living in nvme, I think we
> > need to move it into the PCI core, ensure we properly take drv->lock
> > to synchronize it, and check for dev->drv instead of the private data
> > which is a guestimate.
> 
> I agree this sort of thing needs to go in the PCI layer to as common
> solution for all devices. The NVMe driver shouldn't be responsible for bus
> enumeration events. When we did that before, races with pciehp were a
> problem.
> 
> Also, we don't have a once-per-second health check event that would have
> been needed to even catch this event in the first place. To get here now,
> you'll have to issue an nvme reset or wait 60 seconds after sending an
> admin or IO command.
>  
> > On Fri, Jun 30, 2017 at 04:56:04PM -0700, Wei Zhang wrote:
> > > This patch removes the PCI device from the kernel's topology tree
> > > if the device is no longer present.
> > > 
> > > Commit ddf097ec1d44c9648c4738d7cf2819411b44253a (NVMe: Unbind driver on
> > > failure) left the PCI device in the kernel's topology upon device failure.
> > > However, this does not work well for the slot power off/on test cases.
> > > After a slot power off, we need to manually remove the PCI device
> > > before triggering the rescan, in order for the SSD to be rediscovered.
> > > 
> > > Fixes: ddf097ec1d44c9648c4738d7cf2819411b44253a
> > > Signed-off-by: Wei Zhang <wzhang at fb.com>
> > > Reviewed-by: Jens Axboe <axboe at fb.com>
> > > ---
> > >  drivers/nvme/host/pci.c | 15 +++++++++++++--
> > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > index 32a98e2..094b22f 100644
> > > --- a/drivers/nvme/host/pci.c
> > > +++ b/drivers/nvme/host/pci.c
> > > @@ -2174,8 +2174,19 @@ static void nvme_remove_dead_ctrl_work(struct work_struct *work)
> > >  	struct pci_dev *pdev = to_pci_dev(dev->dev);
> > >  
> > >  	nvme_kill_queues(&dev->ctrl);
> > > -	if (pci_get_drvdata(pdev))
> > > -		device_release_driver(&pdev->dev);
> > > +
> > > +	/*
> > > +	 * Remove the PCI device from the topology tree if the device is no longer
> > > +	 * present.  Without removing, slot power off/on test cannot re-discover
> > > +	 * the SSD.
> > > +	 */
> > > +	if (pci_get_drvdata(pdev)) {
> > > +		if (!pci_device_is_present(pdev)) {
> > > +			pci_stop_and_remove_bus_device_locked(pdev);
> > > +		} else {
> > > +			device_release_driver(&pdev->dev);
> > > +		}
> > > +	}
> > >  	nvme_put_ctrl(&dev->ctrl);
> > >  }



More information about the Linux-nvme mailing list