[PATCH V3 0/3] Ensure ordered namespace registration during async scan

John Meneghini jmeneghi at redhat.com
Thu Feb 26 08:35:15 PST 2026


On 2/26/26 3:07 AM, Maurizio Lombardi wrote:
> On Wed Feb 25, 2026 at 10:41 PM CET, Keith Busch wrote:
>> On Wed, Feb 25, 2026 at 05:12:00PM +0100, Maurizio Lombardi wrote:
>>> The NVMe fully asynchronous namespace scanning introduced in
>>> commit 4e893ca81170 ("nvme-core: scan namespaces asynchronously")
>>> significantly improved discovery times. However, it also introduced
>>> non-deterministic ordering for namespace registration.
>>>
>>> While kernel device names (/dev/nvmeXnY) are not guaranteed to be stable
>>> across reboots, this unpredictable ordering has caused considerable user
>>> confusion and has been perceived as a regression, leading to multiple bug
>>> reports.
>>
>> The nvme-pci driver also probes the controllers asynchronously, which
>> can also create non-determinisitic names. Is that part not a problem?
> 
> Potentially, it is. The difference is that so far no one ever complained
> about it, while with namespace async scanning we immediately received regression
> reports, to the point we had to revert the changes and restore the
> sequential namespaces scan in RHEL.

It's worse than this.  Yes, in RHEL we carry out of tree patches to tun off the async scanning with SCSI,
and we reverted this async namespace scanning patch in NVMe.

We had to do this because, as soon as we turned these async scanning mechanisms on, we immediately
received customer escalations. Customer were not able to upgrade their systems. We have customer issues
and complaints open about this and we see this async namespace scanning as a barrier to adoption with NVMEe -
especially with NVME-OF which tends to have many more Namespaces than PCIe.

We've talked about this at LSF/MM - more than once - and several solutions have been proposed in the past,
but nothing ever happened.

And yes, the PCIe async discovery stuff does cause some problems.  The difference is: the PCIe bus configuration does
not change nearly as often as, e.g., the nvme namespace configuration in a fabric, so customers don't notice the changing pci ids.
Unless some one is going lots of hot unplugging and plugging with their PCI bus, the PCI ids typically don't change at all.

So from boot to boot, pci id don't usually change.  This async namespace scanning causes the namespace ids to change with every reboot, especially on
a system with 100's of nvme-of namespaces.

So, we really need this change, or something like this, to be accepted upstream.

/John




More information about the Linux-nvme mailing list