[Lsf-pc] [LSF/MM/BPF ATTEND][LSF/MM/BPF TOPIC] Meta/Integrity/PI improvements

Hannes Reinecke hare at suse.de
Tue Apr 2 04:37:19 PDT 2024


On 4/2/24 12:45, Dongyang Li wrote:
> Martin, Kanchan,
>>
>> Kanchan,
>>
>>> - Generic user interface that user-space can use to exchange meta.
>>> A new io_uring opcode IORING_OP_READ/WRITE_META - seems feasible
>>> for direct IO.
>>
>> Yep. I'm interested in this too. Reviving this effort is near the top
>> of my todo list so I'm happy to collaborate.
> If we are going to have a interface to exchange meta/integrity to user-
> space, we could also have a interface in kernel to do the same?
> 
> It would be useful for some network filesystem/block device drivers
> like nbd/drbd/NVMe-oF to use blk-integrity as network checksum, and the
> same checksum covers the I/O on the server as well.
> 
> The integrity can be generated on the client and send over network,
> on server blk-integrity can just offload to storage.
> Verify follows the same principle: on server blk-integrity gets
> the PI from storage using the interface, and send over network,
> on client we can do the usual verify.
> 
> In the past we tried to achieve this, there's patch to add optional
> generate/verify functions and they take priority over the ones from the
> integrity profile, and the optional generate/verify functions does the
> meta/PI exchange, but that didn't get traction. It would be much better
> if we can have an bio interface for this.
> 
Not sure if I understand.
Key point of PI is that there _is_ hardware interaction on the disk 
side, and that you can store/offload PI to the hardware.
That PI data can be transferred via the transport up to the application,
and the application can validate it.
I do see the case for nbd (in the sense that nbd should be enabled to 
hand down PI information if it receives them). NVMe-oF is trying to use
PI (which is what this topic is about).
But drbd?
What do you want to achieve? Sure drbd should be PI enabled, but I can't 
really see how it would forward PI information; essentially drbd is a
network-based RAID1, so what should happen with the PI information?
Should drbd try to combine PI information from both legs?
Is the PI information from both legs required to be the same?
Incidentally, the same question would apply to 'normal' RAID1.
In the end, I'm tempted to declare PI to be terminated at that
level to treat everything the same.
But I'd be open to discussion here.

Cheers,

Hannes




More information about the Linux-nvme mailing list