[PATCH] nvme: don't reject probe due to duplicate IDs for single-ported PCIe devices

Tue Jan 20 01:54:25 PST 2026

On 2023/7/13 21:30, Christoph Hellwig wrote:
> While duplicate IDs are still very harmful, including the potential to easily
> see changing devices in /dev/disk/by-id, it turn out they are extremely
> common for cheap end user NVMe devices.
> 
> Relax our check for them for so that it doesn't reject the probe on
> single-ported PCIe devices, but prints a big warning instead.  In doubt
> we'd still like to see quirk entries to disable the potential for
> changing supposed stable device identifier links, but this will at least
> allow users how have two (or more) of these devices to use them without
> having to manually add a new PCI ID entry with the quirk through sysfs or
> by patching the kernel.
> 
> Co-developed-by: Sagi Grimberg <sagi at grimberg.me>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/core.c | 36 +++++++++++++++++++++++++++++++++---
>   1 file changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 47d7ba2827ff29..37b6fa74666204 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3431,10 +3431,40 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
>   
>   	ret = nvme_global_check_duplicate_ids(ctrl->subsys, &info->ids);
>   	if (ret) {
> -		dev_err(ctrl->device,
> -			"globally duplicate IDs for nsid %d\n", info->nsid);
> +		/*
> +		 * We've found two different namespaces on two different
> +		 * subsystems that report the same ID.  This is pretty nasty
> +		 * for anything that actually requires unique device
> +		 * identification.  In the kernel we need this for multipathing,
> +		 * and in user space the /dev/disk/by-id/ links rely on it.
> +		 *
> +		 * If the device also claims to be multi-path capable back off
> +		 * here now and refuse the probe the second device as this is a
> +		 * recipe for data corruption.  If not this is probably a
> +		 * cheap consumer device if on the PCIe bus, so let the user
> +		 * proceed and use the shiny toy, but warn that with changing
> +		 * probing order (which due to our async probing could just be
> +		 * device taking longer to startup) the other device could show
> +		 * up at any time.
> +		 */
>   		nvme_print_device_info(ctrl);
> -		return ret;
> +		if ((ns->ctrl->ops->flags & NVME_F_FABRICS) || /* !PCIe */
> +		    ((ns->ctrl->subsys->cmic & NVME_CTRL_CMIC_MULTI_CTRL) &&
> +		     info->is_shared)) {
> +			dev_err(ctrl->device,
> +				"ignoring nsid %d because of duplicate IDs\n",
> +				info->nsid);
> +			return ret;
> +		}
> +
> +		dev_err(ctrl->device,
> +			"clearing duplicate IDs for nsid %d\n", info->nsid);
> +		dev_err(ctrl->device,
> +			"use of /dev/disk/by-id/ may cause data corruption\n");
> +		memset(&info->ids.nguid, 0, sizeof(info->ids.nguid));
> +		memset(&info->ids.uuid, 0, sizeof(info->ids.uuid));
> +		memset(&info->ids.eui64, 0, sizeof(info->ids.eui64));
> +		ctrl->quirks |= NVME_QUIRK_BOGUS_NID;
>   	}
>   
>   	mutex_lock(&ctrl->subsys->lock);

Hi,

I’d like to discuss whether we should revisit the duplicate-ID check
for NVMe-oF transports, especially in HA dual-active setups.

In such HA configurations, a single LUN is exposed via multiple subsystems
(one per storage controller) to provide redundancy. Because it represents
the same namespace, it usually reports the same UUID/NGUID/EUI64 on all 
paths.

With the logic introduced in this patch, Fabrics are still strictly
rejected:

 > +		if ((ns->ctrl->ops->flags & NVME_F_FABRICS) || /* !PCIe */
 > +		    ((ns->ctrl->subsys->cmic & NVME_CTRL_CMIC_MULTI_CTRL) &&
 > +		     info->is_shared)) {

Concretely, with two subsystems exposing the same LUN in a dual-active
HA configuration:

- Only paths from one subsystem are used;
- When that controller fails, the host cannot fail over to the other
   subsystem because its namespace was ignored, effectively breaking HA.

Would it make sense to:

1) relax the duplicate ID check for NVMe-oF HA dual-active use cases, or

2) add a module parameter (e.g., `nvme_core.allow_duplicate_ids`) so admins
    can opt-in when they know their storage topology and accept the
    /dev/disk/by-id risks?

Keeping the default strict is fine, but having an escape hatch would be
very helpful for HA deployments.

Thanks,
Xiaoke