trying to gain understanding on BLKRRPART and "failed to re-read partition table" error

Mon Jan 2 14:52:28 PST 2023

I've been using nvme-cli 1.9 (default on ubuntu 18.04 LTS) for several
months formatting nvme drives repeatedly without issue before running
device-level benchmarks with fio. I always format passing in the block
device, e.g., /dev/nvme0n1, and haven't had any issues.

I recently started playing with creating file systems on the drives to
do file-based benchmarking. After doing this, I've noticed that
intermittently the format on a drive that has had partitions,
filesystem, and files created will fail with code ECOMM on the
re-reading of the partition table. However, I've struggled to make the
failure happen reliably. This is the error I'm talking about:

https://github.com/linux-nvme/nvme-cli/blob/c3db2bfda5346f68344a9e6d795319a7bf35d19e/nvme.c#L5212

The current nvme code has multiple guards on running the problematic
BLKRRPART, and one of them, "cfg.lbaf != prev_lbaf", is enough that
current nvme-cli would not run the BLKRRPART for my nvme formats.

But I'm hoping to understand if these guards are actually fixing the
issue I'm hitting in 1.9, or would just mask my issue... that is,
upgrading to current nvme-cli would make the error go away, but it
seems possible that changes to how I format cause BLKRRPART to run
would bring it back). Are the added protections for the block device
format case to avoid issues like the one I am hitting? Or is it
unusual that I'm seeing such an error?

In case it helps, the system does not have the drive or namespace
mounted in any way, and the format is being issued immediately after
the system is resuming from a suspend. Happy to provide any more
details that might help.

Thanks,
Nick