[PATCH v2 2/3] block: support PI at non-zero offset within metadata

Wed Oct 2 03:29:50 PDT 2024

On 01.10.2024 09:37, Keith Busch wrote:
>On Tue, Oct 01, 2024 at 09:27:08AM +0200, Javier González wrote:
>> On 30.09.2024 13:57, Martin K. Petersen wrote:
>> >
>> > Kanchan,
>> >
>> > > I spent a good deal of time on this today. I was thinking to connect
>> > > block read_verify/write_generate knobs to influence things at nvme level
>> > > (those PRCHK flags). But that will not be enough. Because with those
>> > > knobs block-layer will not attach meta-buffer, which is still needed.
>> > >
>> > > The data was written under the condition when nvme driver set the
>> > > pi_type to 0 (even though at device level it was non-zero) during
>> > > integrity registration.
>> > >
>> > > Thinking whether it will make sense to have a knob at the block-layer
>> > > level to do something like that i.e., override the set
>> > > integrity-profile with nop.
>> >
>> > SCSI went to great lengths to ensure that invalid protection information
>> > would never be written during normal operation, regardless of whether
>> > the host sent PI or not. And thus the only time one would anticipate a
>> > PI error was if the data had actually been corrupted.
>> >
>>
>> Is this something we should work on bringin to the NVMe TWG?
>
>Maybe. It looks like they did the spec this way one purpose with the
>ability to toggle guard tags per IO.
>
>Just some more background on this because it may sound odd to use a data
>protection namespace format that the kernel didn't support:
>
>In this use case, writes to the device primarily come from the
>passthrough interface, which could always use the guard tags for
>end-to-end protection. The kernel block IO was the only path that had
>the limitation.
>
>Besides the passthrough interface, though, the setup uses kernel block
>layer to write the partition tables. Upgrading from 6.8 -> 6.9 won't be
>able to read the partition table on these devices. I'm still not sure
>the best way to handle this, though.

Mmmm. Need to think more about this.

Seems this is by design in NVMe.  Will try to come back to Mike and Judy
to understand the background for this at the time...