[Lsf-pc] [LSF/MM/BPF ATTEND][LSF/MM/BPF TOPIC] Meta/Integrity/PI improvements

Kanchan Joshi joshi.k at samsung.com
Wed Mar 27 06:45:56 PDT 2024


On 2/27/2024 4:45 AM, Martin K. Petersen wrote:
> 
> Kanchan,
> 
>> - Generic user interface that user-space can use to exchange meta. A
>> new io_uring opcode IORING_OP_READ/WRITE_META - seems feasible for
>> direct IO.
> 
> Yep. I'm interested in this too. Reviving this effort is near the top of
> my todo list so I'm happy to collaborate.

The first cut is here:
https://lore.kernel.org/linux-block/20240322185023.131697-1-joshi.k@samsung.com/

Not sure how far it is from the requirements you may have. Feedback will 
help.
Perhaps the interface needs the ability to tell what kind of checks 
(guard, apptag, reftag) are desired.
Doable, but that will require the introduction of three new RWF_* flags.

>> NVMe SSD can do the offload when the host sends the PRACT bit. But in
>> the driver, this is tied to global integrity disablement using
>> CONFIG_BLK_DEV_INTEGRITY.
> 
>> So, the idea is to introduce a bio flag REQ_INTEGRITY_OFFLOAD
>> that the filesystem can send. The block-integrity and NVMe driver do
>> the rest to make the offload work.
> 
> Whether to have a block device do this is currently controlled by the
> /sys/block/foo/integrity/{read_verify,write_generate} knobs.

Right. This can work for the case when host does not need to pass the 
buffer (meta-size is equal to pi-size).
But when meta-size is greater than pi-size, the meta-buffer needs to be 
allocated. Some changes are required so that Block-integrity does that 
allocation, without having to do read_verify/write_generate.

> At least
> for SCSI, protected transfers are always enabled between HBA and target
> if both support it. If no integrity has been attached to an I/O by the
> application/filesystem, the block layer will do so controlled by the
> sysfs knobs above. IOW, if the hardware is capable, protected transfers
> should always be enabled, at least from the block layer down.
> It's possible that things don't work quite that way with NVMe since, at
> least for PCIe, the drive is both initiator and target. And NVMe also
> missed quite a few DIX details in its PI implementation. It's been a
> while since I messed with PI on NVMe, I'll have a look.

PRACT=1 case, figure 9 and Section 5.2.2, in the spec: 
https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-1.0d-2023.12.28-Ratified.pdf

I am not sure whether SCSI also has the equivalent of this bit.


> But in any case the intent for the Linux code was for protected
> transfers to be enabled automatically when possible. If the block layer
> protection is explicitly disabled, a filesystem can still trigger
> protected transfers via the bip flags. So that capability should
> definitely be exposed via io_uring.
> 
>> "Work is in progress to implement support for the data integrity
>> extensions in btrfs, enabling the filesystem to use the application
>> tag."
> 
> This didn't go anywhere for a couple of reasons:

Thanks, this was very helpful!



More information about the Linux-nvme mailing list