nvme/pcie hot plug results in /dev name change

Keith Busch kbusch at kernel.org
Tue Jan 31 08:38:47 PST 2023


On Sun, Jan 29, 2023 at 06:28:05PM +0800, Ming Lei wrote:
> On Fri, Jan 20, 2023 at 11:01:53PM -0800, Christoph Hellwig wrote:
> > On Fri, Jan 20, 2023 at 02:42:23PM -0700, Keith Busch wrote:
> > > That is correct. We don't know the identity of the device at the point
> > > we have to assign it an instance number, so the hot added one will just
> > > get the first available unique number. If you need a consistent name, we
> > > have the persistent naming rules that should create those links in
> > > /dev/disk/by-id/.
> > 
> > Note that this a bit of a problem under a file system or stacking driver
> > that handles failing drives (e.g. btrfs or md raid), that holds ontop
> > the "old" device file, and then fails to find the new one.  I had a
> > customer complaint for that as well :)
> > 
> > The first hack was to force run the multipath code that can keep the
> > node alive.  That works, but is really ugly especially when dealing
> > with corner cases such as overlapping nsids between different
> > controllers.
> > 
> > In the long run I think we'll need to:
> >  - send a notification to the holder if a device is hot removed from
> >    the block layer so that it can clean up
> 
> When the disk is deleted, the notification has been sent to userspace
> via udev/kobj uevent, so user can umount the original FS or
> DM/MD userspace can handle the device removal.
> 
> >  - make the upper layers look for the replugged devie
> > 
> > I've been working on some of this for a while but haven't made much
> > progress due to other committments.
> 
> block device persistent name is supposed to be supported by userspace,
> such as udev rule.

Come to think of it, I actually have heard many complaints about this behavior.
Requiring user space deal with the teardown and restore of their open files and
mount points on a transient link loss can be inconvenient. Example use cases
are firmware activation requiring a Subsystem Reset, or a PCIe error
containment event. Those cause the links to bounce, which can trigger hot plug
events in some platforms.

The native nvme multipath looks like it could be leveraged to improving that
user experience if we wanted to make that layer an option for non-multipath
devices.



More information about the Linux-nvme mailing list