[External] : Re: [PATCH 1/1] nvme-pci: Add quirk for Samsung PM173X with Subsystem Vendor id:0x108e

Shminderjit Singh shminderjit.singh at oracle.com
Fri May 16 01:13:44 PDT 2025


Hi Christoph,

Please find below an explanation on the bug:

This firmware bug start messing with the systems when in lvm.conf file the default value of use_devicesfile was set to 1. Its default value was used to be 0. 
Conditions required for the issue to occur:
1. The system is configured for dual boot.
2. The use_devicesfiles value in /etc/lvm.conf should be 1 by default or it should be set to 1 when during initial system setup.
3. An NvME device with firmware which has this bug.
4. The two kernels in the dual boot setup must differ in handling the nvme quirks, one kernel must have NVME_QUIRK_BOGUS_NSID set for this Samsung PM173X device and the other one should not.

Behavior:
When 1st booted in kernel which doesn't have this NVME_QUIRK_BOGUS_NSID quirk set. The system has root volume and other partitions configured on the Samsung NvME device, and the corresponding devices entries are created in /etc/lvm/device/devicesfile based on the EUID. The entries made in this file limits the visibility of the lvm devices. These euids are unique and are requested directly from the devic.
After configuring the setup, system is rebooted into 2nd kernel, which has NVME_QUIRK_BOGUS_NSID quirk set. When quirk is set the nvme driver generate an euid instead of requesting them from the device, this euId is of the format of nvme-<vendor id>-<serial no in hex>-<model no. in hex>-<namespace id>.  And since use_devicesfile is set, the lvm relies on entries made in /etc/lvm/device/devicesfile. But due to mismatch in euid formats between the two kernels,  lvm fails to identify the device, as a result, any volume residing on this device become invisible, causing the boot to fail since no root volume available.

The failure works the other way around also, i.e., if system is configured with kernel with NVME_QUIRK_BOGUS_NSID quirk set and then booted into the kernel with NVME_QUIRK_BOGUS_NSID quirk not set. This mismatch in getting the id leads to volume not visible.

So, the issue is inconsistency in EUID between two kernels and without both kernel having same quirk set or unset the volumes will remain invisible in one of the kernels.

Thanks,
Shminder

> -----Original Message-----
> From: Christoph Hellwig <hch at lst.de>
> Sent: Friday, May 16, 2025 12:46 PM
> To: Shminderjit Singh <shminderjit.singh at oracle.com>
> Cc: Christoph Hellwig <hch at lst.de>; kbusch at kernel.org; axboe at fb.com;
> sagi at grimberg.me; linux-nvme at lists.infradead.org; Junxiao Bi
> <junxiao.bi at oracle.com>
> Subject: [External] : Re: [PATCH 1/1] nvme-pci: Add quirk for Samsung
> PM173X with Subsystem Vendor id:0x108e
> 
> On Fri, May 16, 2025 at 07:13:47AM +0000, Shminderjit Singh wrote:
> > Hi Christoph,
> >
> > This device is widely used both within and outside the organization. Applying
> a firmware fix is not viable option. Since this is firmware specific bug, I am
> adding a flag for OEM specific devices only.
> 
> The general rule is that for enterprise devices we expect you to fix firmware
> bugs in firmware.  And skipping all IDs suddently for an old device also seems
> like a really odd bug report that you did not even manage to explain.




More information about the Linux-nvme mailing list