[LSF/MM/BPF ATTEND][LSF/MM/BPF TOPIC] Meta/Integrity/PI improvements
Keith Busch
kbusch at kernel.org
Thu Feb 22 12:08:54 PST 2024
On Fri, Feb 23, 2024 at 01:03:01AM +0530, Kanchan Joshi wrote:
> With respect to the current state of Meta/Block-integrity, there are
> some missing pieces.
> I can improve some of it. But not sure if I am up to speed on the
> history behind the status quo.
>
> Hence, this proposal to discuss the pieces.
>
> Maybe people would like to discuss other points too, but I have the
> following:
>
> - Generic user interface that user-space can use to exchange meta. A
> new io_uring opcode IORING_OP_READ/WRITE_META - seems feasible for
> direct IO. Buffered IO seems non-trivial as a relatively smaller meta
> needs to be written into/read from the page cache. The related
> metadata must also be written during the writeback (of data).
>
>
> - Is there interest in filesystem leveraging the integrity capabilities
> that almost every enterprise SSD has.
> Filesystems lacking checksumming abilities can still ask the SSD to do
> it and be more robust.
> And for BTRFS - there may be value in offloading the checksum to SSD.
> Either to save the host CPU or to get more usable space (by not
> writing the checksum tree). The mount option 'nodatasum' can turn off
> the data checksumming, but more needs to be done to make the offload
> work.
As I understand it, btrfs's checksums are on a variable extent size, but
offloading it to the SSD would do it per block, so it's forcing a new
on-disk format. It would be cool to use it, though: you could atomically
update data and checksums without stable pages.
> NVMe SSD can do the offload when the host sends the PRACT bit. But in
> the driver, this is tied to global integrity disablement using
> CONFIG_BLK_DEV_INTEGRITY.
> So, the idea is to introduce a bio flag REQ_INTEGRITY_OFFLOAD
> that the filesystem can send. The block-integrity and NVMe driver do
> the rest to make the offload work.
>
> - Currently, block integrity uses guard and ref tags but not application
> tags.
> As per Martin's paper [*]:
>
> "Work is in progress to implement support for the data
> integrity extensions in btrfs, enabling the filesystem
> to use the application tag."
More information about the Linux-nvme
mailing list