[PATCH v3 0/6] block: fix integrity offset/length conversions

Caleb Sander Mateos csander at purestorage.com
Thu Apr 23 11:02:50 PDT 2026


On Mon, Apr 20, 2026 at 7:09 PM Martin K. Petersen
<martin.petersen at oracle.com> wrote:
>
>
> Hi Caleb!
>
> > NVM Command Set specification 1.1 section 5.3.3 requires the reference
> > tag to increment by 1 per logical block, so that seems to determine
> > the increment unit:
>
> SCSI allows PI to be interleaved at intervals smaller than the logical
> block size. This was done for PI compatibility in mixed environments
> with both 512[en] and 4Kn disks. Interleaving allows 8 bytes of PI per
> 512 bytes of data on devices using 4 KB logical blocks. That is the
> reason why we use the term "integrity interval" instead of assuming
> logical block size.

Thanks for the explanation, I'm not too familiar with SCSI. I meant to
refer to integrity intervals in my explanation if they differ from the
logical block size.

>
> > The ref tag used for a particular block needs to be consistent. And
> > since reftag(block N) can be computed as the reftag(M) + N - M if
> > block N is accessed as part of an I/O that begins at block M, the
> > function must be of the form reftag(block N) = N + c for some constant
> > c. Thus, the ref tag seed needs to be computed in units of logical
> > blocks (integrity intervals); no other unit (e.g. 512-byte sectors)
> > works.
>
> Whoever attaches the PI decides on the seed value. In the case of the
> block layer it made sense to use block layer sector number since that
> value is inevitably going to be the same for a future read.

I'm not following "going to be the same for a future read". The block
can be read back by an I/O with a different starting
offset/sector/seed, as my example illustrates. When the integrity
interval size differs from the sector size (512 bytes), mixing the two
units results in a different ref tag seed for the block depending on
the starting offset of the I/O.

>
> Note that with MD, DM, and partitioning in the mix, the sector number
> seen by whoever submits the I/O is going to be different from the LBAs
> on the target devices which eventually receive the I/O. Nobody says
> there is a computable constant offset. Think scattered LVM extent
> allocations. Or RAID stripes placed at mismatched LBA offsets.

The constant offset relationship still needs to hold over any
contiguous range of a backing block device that can be accessed by a
single I/O. For example, with partitions, it's not possible for a
single I/O to cross a partition boundary, so each partition can have a
different constant offset between the ref tags and absolute integrity
interval numbers. With RAID, each shard can have a different constant
offset. etc.

>
> > To see the issue with the current approach, consider an example
> > accessing LBA 1 on a device with a 4 KB block size. If the block is
> > written as part of a write that begins at LBA 0, its ref tag in the
> > generated PI will be 1 (sector 0 + 1 integrity interval). If it's
> > later read by a read starting at LBA 1, its expected ref tag will be 8
> > (sector 8 + 0 integrity intervals), and the auto-integrity code will
> > fail the read due to a reftag mismatch.
>
> Something is broken, then. Because the ref tag in the received PI should
> have been remapped to start at 8 in that case.

Ah, I missed the remapping piece. Thanks for pointing that out. I
guess I was testing with a ublk device that doesn't advertise
BLK_INTEGRITY_REF_TAG. Since commit 203247c5cb97 ("blk-integrity:
support arbitrary buffer alignment"), the ref tag is unconditionally
set in the PI from the (sector) seed, but the remapping is conditional
on BLK_INTEGRITY_REF_TAG. That explains why I was seeing ref tags in
the PI that didn't match the integrity interval numbers.

So seems like patch 1 ("block: use integrity interval instead of
sector as seed") doesn't need a Fixes tag. Still, I'm confused why the
auto-integrity code bothers setting the seed to the sector number in
the first place if it's going to be remapped later. Why not just leave
the seed zeroed?

Best,
Caleb

>
> > I agree, the seed doesn't need to match the final LBA, but it does
> > need to be in *units* of logical blocks, plus some constant offset.
>
> Your concept of "unit" still sends the wrong message. The seed is an
> integer value used to initialize a counter or hardware register. The
> seed only has meaning to whichever entity submits the I/O. To everything
> else it is a value used for remapping ref tags from the I/O submitter's
> point of view to whichever interpretation is mandated by the storage
> hardware's PI format.
>
> > With a ublk device. It should affect any block device that supports
> > integrity and has a logical block size > 512.
>
> It sounds like the seed value is set incorrectly for reads in your
> configuration.
>
> --
> Martin K. Petersen



More information about the Linux-nvme mailing list