[Bug Report] nvme-cli fails re-formatting NVMe namespace

Nilay Shroff nilay at linux.ibm.com
Fri Mar 15 07:31:33 PDT 2024


Hi,

We found that "nvme format ..." command fails to format nvme disk with block-size set to 512.

Notes and observations:
====================== 
This is observed on the latest linus kernel tree. This was working well on kernel v6.8.

Test details:
=============
At system boot or when nvme is hot plugin, the nvme block size is 4096 and later if we try format
it with the block-size of 512 (lbaf=2) then it fails. Interestingly, if we start with the nvme block
size of 512 and later if we try format it with block-size of 4096 (lbaf=0) then it doesn't fail. 
Please note that CONFIG_NVME_MULTIPATH is enabled.
 
Please find below further details:

# lspci 
0018:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X

# nvme list 
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            S6EUNA0R500358       1.6TB NVMe Gen4 U.2 SSD                  0x1          1.60  TB /   1.60  TB    512   B +  0 B   REV.SN49

# nvme id-ns /dev/nvme0n1 -H 
NVME Identify Namespace 1:
nsze    : 0xba4d4ab0
ncap    : 0xba4d4ab0
nuse    : 0xba4d4ab0

<snip>
<snip>

nlbaf   : 4
flbas   : 0
  [6:5] : 0	Most significant 2 bits of Current LBA Format Selected
  [4:4] : 0	Metadata Transferred in Separate Contiguous Buffer
  [3:0] : 0	Least significant 4 bits of Current LBA Format Selected
  
<snip>
<snip>  

LBA Format  0 : Metadata Size: 0   bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use)
LBA Format  1 : Metadata Size: 8   bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good 
LBA Format  2 : Metadata Size: 0   bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better 
LBA Format  3 : Metadata Size: 8   bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded 
LBA Format  4 : Metadata Size: 64  bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded 

# lsblk -t /dev/nvme0n1 
NAME    ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE  RA WSAME
nvme0n1         0   4096      0    4096    4096    0               128    0B
                                   ^^^     ^^^ 	

!!!! FAILING TO FORMAT with 512 bytes of block size !!!!

# nvme format /dev/nvme0n1 --lbaf=2 --pil=0 --ms=0 --pi=0 -f 
Success formatting namespace:1
failed to set block size to 512
^^^

# lsblk -t /dev/nvme0n1 
NAME    ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE  RA WSAME
nvme0n1         0   4096      0    4096    4096    0               128    0B
                                   ^^^     ^^^
# cat /sys/block/nvme0n1/queue/logical_block_size:4096
# cat /sys/block/nvme0n1/queue/physical_block_size:4096

# cat /sys/block/nvme0c0n1/queue/logical_block_size:512
# cat /sys/block/nvme0c0n1/queue/physical_block_size:512


# nvme id-ns /dev/nvme0n1 -H 
NVME Identify Namespace 1:
nsze    : 0xba4d4ab0
ncap    : 0xba4d4ab0
nuse    : 0xba4d4ab0
<snip>
<snip>
nlbaf   : 4
flbas   : 0x2
  [6:5] : 0	Most significant 2 bits of Current LBA Format Selected
  [4:4] : 0	Metadata Transferred in Separate Contiguous Buffer
  [3:0] : 0x2	Least significant 4 bits of Current LBA Format Selected
<snip>
<snip>

LBA Format  0 : Metadata Size: 0   bytes - Data Size: 4096 bytes - Relative Performance: 0 Best 
LBA Format  1 : Metadata Size: 8   bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good 
LBA Format  2 : Metadata Size: 0   bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better (in use)
LBA Format  3 : Metadata Size: 8   bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded 
LBA Format  4 : Metadata Size: 64  bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded 


Note : We could see above that the NVMe is indeed formatted with lbaf 2(block size 512). However,
the block queue limits are not correctly updated.

Git bisect:
==========
Git bisect reveals the following commit as bad commit:

8f03cfa117e06bd2d3ba7ed8bba70a3dda310cae is the first bad commit
commit 8f03cfa117e06bd2d3ba7ed8bba70a3dda310cae
Author: Christoph Hellwig <hch at lst.de>
Date:   Mon Mar 4 07:04:51 2024 -0700

    nvme: don't use nvme_update_disk_info for the multipath disk
    
    Currently nvme_update_ns_info_block calls nvme_update_disk_info both for
    the namespace attached disk, and the multipath one (if it exists).  This
    is very different from how other stacking drivers work, and leads to
    a lot of complexity.
    
    Switch to setting the disk capacity and initializing the integrity
    profile, and let blk_stack_limits which already is called just below
    deal with updating the other limits.
    
    Signed-off-by: Christoph Hellwig <hch at lst.de>
    Signed-off-by: Keith Busch <kbusch at kernel.org>

 drivers/nvme/host/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


The above commit is part of the new atomic queue limit updates patch series. For 
NVMe device if multipath config is enabled then we rely on blk_stack_limits to 
update the queue limits for the stacked device. For updating the logical/physical
queue limit of the top (nvme%dn%d) device, the blk_stack_limits() uses the max of 
top and bottom limit:

	t->logical_block_size = max(t->logical_block_size,
				    b->logical_block_size);

	t->physical_block_size = max(t->physical_block_size,
				     b->physical_block_size);

When we try formatting the nvme disk with block-size of 512, the value of 
t->logical_block_size would be 4096 (as this is the initial block-size) however the
value of b->logical_block_size would be 512 (the block size of the bottom device is first 
updated in nvme_update_ns_info_block()).

I think we may want to update the queue limits of both top and bottom devices in the
nvme_update_ns_info_block(). Or if there's some other way?

Let me know if you need any further information.

Thanks,
--Nilay









More information about the Linux-nvme mailing list