[Bug Report] nvme-cli fails re-formatting NVMe namespace
Nilay Shroff
nilay at linux.ibm.com
Fri Mar 15 07:31:33 PDT 2024
Hi,
We found that "nvme format ..." command fails to format nvme disk with block-size set to 512.
Notes and observations:
======================
This is observed on the latest linus kernel tree. This was working well on kernel v6.8.
Test details:
=============
At system boot or when nvme is hot plugin, the nvme block size is 4096 and later if we try format
it with the block-size of 512 (lbaf=2) then it fails. Interestingly, if we start with the nvme block
size of 512 and later if we try format it with block-size of 4096 (lbaf=0) then it doesn't fail.
Please note that CONFIG_NVME_MULTIPATH is enabled.
Please find below further details:
# lspci
0018:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
# nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1 /dev/ng0n1 S6EUNA0R500358 1.6TB NVMe Gen4 U.2 SSD 0x1 1.60 TB / 1.60 TB 512 B + 0 B REV.SN49
# nvme id-ns /dev/nvme0n1 -H
NVME Identify Namespace 1:
nsze : 0xba4d4ab0
ncap : 0xba4d4ab0
nuse : 0xba4d4ab0
<snip>
<snip>
nlbaf : 4
flbas : 0
[6:5] : 0 Most significant 2 bits of Current LBA Format Selected
[4:4] : 0 Metadata Transferred in Separate Contiguous Buffer
[3:0] : 0 Least significant 4 bits of Current LBA Format Selected
<snip>
<snip>
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use)
LBA Format 1 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good
LBA Format 2 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better
LBA Format 3 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded
LBA Format 4 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded
# lsblk -t /dev/nvme0n1
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
nvme0n1 0 4096 0 4096 4096 0 128 0B
^^^ ^^^
!!!! FAILING TO FORMAT with 512 bytes of block size !!!!
# nvme format /dev/nvme0n1 --lbaf=2 --pil=0 --ms=0 --pi=0 -f
Success formatting namespace:1
failed to set block size to 512
^^^
# lsblk -t /dev/nvme0n1
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
nvme0n1 0 4096 0 4096 4096 0 128 0B
^^^ ^^^
# cat /sys/block/nvme0n1/queue/logical_block_size:4096
# cat /sys/block/nvme0n1/queue/physical_block_size:4096
# cat /sys/block/nvme0c0n1/queue/logical_block_size:512
# cat /sys/block/nvme0c0n1/queue/physical_block_size:512
# nvme id-ns /dev/nvme0n1 -H
NVME Identify Namespace 1:
nsze : 0xba4d4ab0
ncap : 0xba4d4ab0
nuse : 0xba4d4ab0
<snip>
<snip>
nlbaf : 4
flbas : 0x2
[6:5] : 0 Most significant 2 bits of Current LBA Format Selected
[4:4] : 0 Metadata Transferred in Separate Contiguous Buffer
[3:0] : 0x2 Least significant 4 bits of Current LBA Format Selected
<snip>
<snip>
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
LBA Format 1 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good
LBA Format 2 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better (in use)
LBA Format 3 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded
LBA Format 4 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded
Note : We could see above that the NVMe is indeed formatted with lbaf 2(block size 512). However,
the block queue limits are not correctly updated.
Git bisect:
==========
Git bisect reveals the following commit as bad commit:
8f03cfa117e06bd2d3ba7ed8bba70a3dda310cae is the first bad commit
commit 8f03cfa117e06bd2d3ba7ed8bba70a3dda310cae
Author: Christoph Hellwig <hch at lst.de>
Date: Mon Mar 4 07:04:51 2024 -0700
nvme: don't use nvme_update_disk_info for the multipath disk
Currently nvme_update_ns_info_block calls nvme_update_disk_info both for
the namespace attached disk, and the multipath one (if it exists). This
is very different from how other stacking drivers work, and leads to
a lot of complexity.
Switch to setting the disk capacity and initializing the integrity
profile, and let blk_stack_limits which already is called just below
deal with updating the other limits.
Signed-off-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Keith Busch <kbusch at kernel.org>
drivers/nvme/host/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
The above commit is part of the new atomic queue limit updates patch series. For
NVMe device if multipath config is enabled then we rely on blk_stack_limits to
update the queue limits for the stacked device. For updating the logical/physical
queue limit of the top (nvme%dn%d) device, the blk_stack_limits() uses the max of
top and bottom limit:
t->logical_block_size = max(t->logical_block_size,
b->logical_block_size);
t->physical_block_size = max(t->physical_block_size,
b->physical_block_size);
When we try formatting the nvme disk with block-size of 512, the value of
t->logical_block_size would be 4096 (as this is the initial block-size) however the
value of b->logical_block_size would be 512 (the block size of the bottom device is first
updated in nvme_update_ns_info_block()).
I think we may want to update the queue limits of both top and bottom devices in the
nvme_update_ns_info_block(). Or if there's some other way?
Let me know if you need any further information.
Thanks,
--Nilay
More information about the Linux-nvme
mailing list