[LSF/MM/BPF ATTEND][LSF/MM/BPF TOPIC] Meta/Integrity/PI improvements

Kanchan Joshi joshi.k at samsung.com
Fri Feb 23 04:41:36 PST 2024


On 2/23/2024 1:38 AM, Keith Busch wrote:
> On Fri, Feb 23, 2024 at 01:03:01AM +0530, Kanchan Joshi wrote:
>> With respect to the current state of Meta/Block-integrity, there are
>> some missing pieces.
>> I can improve some of it. But not sure if I am up to speed on the
>> history behind the status quo.
>>
>> Hence, this proposal to discuss the pieces.
>>
>> Maybe people would like to discuss other points too, but I have the
>> following:
>>
>> - Generic user interface that user-space can use to exchange meta. A
>> new io_uring opcode IORING_OP_READ/WRITE_META - seems feasible for
>> direct IO. Buffered IO seems non-trivial as a relatively smaller meta
>> needs to be written into/read from the page cache. The related
>> metadata must also be written during the writeback (of data).
>>
>>
>> - Is there interest in filesystem leveraging the integrity capabilities
>> that almost every enterprise SSD has.
>> Filesystems lacking checksumming abilities can still ask the SSD to do
>> it and be more robust.
>> And for BTRFS - there may be value in offloading the checksum to SSD.
>> Either to save the host CPU or to get more usable space (by not
>> writing the checksum tree). The mount option 'nodatasum' can turn off
>> the data checksumming, but more needs to be done to make the offload
>> work.
> 
> As I understand it, btrfs's checksums are on a variable extent size, but
> offloading it to the SSD would do it per block, so it's forcing a new
> on-disk format. It would be cool to use it, though: you could atomically
> update data and checksums without stable pages.
>   

Yes, variable extents but it computes the checksum for each FS block 
size (4k-64K, practically 4K) within each extent.
On-disk format change will not be needed, because in this approach FS 
(and block-integrity) does not really deal with checksums. It only asks 
the device to compute/verify.

Am I missing your point?

>> NVMe SSD can do the offload when the host sends the PRACT bit. But in
>> the driver, this is tied to global integrity disablement using
>> CONFIG_BLK_DEV_INTEGRITY.
>> So, the idea is to introduce a bio flag REQ_INTEGRITY_OFFLOAD
>> that the filesystem can send. The block-integrity and NVMe driver do
>> the rest to make the offload work.
>>





More information about the Linux-nvme mailing list