trying to gain understanding on BLKRRPART and "failed to re-read partition table" error

Fri Jan 6 10:55:51 PST 2023

I've got a sequence of steps that seems to reproduce the issue about
20% of the time. Adding a 10 second sleep after the suspend and before
the format appears to make the problem go away, but I need more
runtime to be 100% certain.

I captured the failure with strace to see that the ioctl(BLKRRPART)
call is returning -1 EBUSY, which means one of the following happens:
disk->open_partitions is true
(disk->part0->bd_holder && disk->part0->bd_holder != owner) is true

There are a couple other cases later in the code that also return -1
EBUSY, but I believe they do so only if one of the above is true.

My best guess is that upon returning from resume, the nvme driver ends
up in nvme_first_scan which calls disk_scan_partitions. If nvme format
runs as this is happening, then the BLKRRPART it issues at the end
fails because disk->open_partitions is nonzero from the resume scan.

If that is indeed what is happening, I imagine that it could still
happen in the latest nvme-cli, but because there are now a number of
cases where BLKRRPART is not run, it is much less likely to be
encountered.

On Mon, Jan 2, 2023 at 5:25 PM Nick Neumann <nick at pcpartpicker.com> wrote:
>
> I was struggling to find why ioctl might return ECOMM, then found the
> change where nvme-cli went stopped returning ECOMM for all negative
> statuses. I'll see if I can get the actual error code ioctl is
> returning.
>
> On Mon, Jan 2, 2023 at 4:52 PM Nick Neumann <nick at pcpartpicker.com> wrote:
> >
> > I've been using nvme-cli 1.9 (default on ubuntu 18.04 LTS) for several
> > months formatting nvme drives repeatedly without issue before running
> > device-level benchmarks with fio. I always format passing in the block
> > device, e.g., /dev/nvme0n1, and haven't had any issues.
> >
> > I recently started playing with creating file systems on the drives to
> > do file-based benchmarking. After doing this, I've noticed that
> > intermittently the format on a drive that has had partitions,
> > filesystem, and files created will fail with code ECOMM on the
> > re-reading of the partition table. However, I've struggled to make the
> > failure happen reliably. This is the error I'm talking about:
> >
> > https://github.com/linux-nvme/nvme-cli/blob/c3db2bfda5346f68344a9e6d795319a7bf35d19e/nvme.c#L5212
> >
> > The current nvme code has multiple guards on running the problematic
> > BLKRRPART, and one of them, "cfg.lbaf != prev_lbaf", is enough that
> > current nvme-cli would not run the BLKRRPART for my nvme formats.
> >
> > But I'm hoping to understand if these guards are actually fixing the
> > issue I'm hitting in 1.9, or would just mask my issue... that is,
> > upgrading to current nvme-cli would make the error go away, but it
> > seems possible that changes to how I format cause BLKRRPART to run
> > would bring it back). Are the added protections for the block device
> > format case to avoid issues like the one I am hitting? Or is it
> > unusual that I'm seeing such an error?
> >
> > In case it helps, the system does not have the drive or namespace
> > mounted in any way, and the format is being issued immediately after
> > the system is resuming from a suspend. Happy to provide any more
> > details that might help.
> >
> > Thanks,
> > Nick