[LSF/MM/BPF TOPIC] File system checksum offload
Kanchan Joshi
joshi.k at samsung.com
Tue Mar 18 00:06:44 PDT 2025
On 2/4/2025 10:46 AM, hch at infradead.org wrote:
> On Mon, Feb 03, 2025 at 06:57:13PM +0530, Kanchan Joshi wrote:
>> But, patches do exactly that i.e., hardware cusm support. And posted
>> numbers [*] are also when hardware is checksumming the data blocks.
>
> I'm still not sure why you think the series implements hardware
> csum support.
Series ensure that (a) that host does not compute the csum, and (b)
device computes.
Not sure if you were doubting the HW instead, but I checked that part
with user-space nvme-passthrough program which
- [During write] does not send checksum and sets PRACT as 1.
- [During read] sends metadata buffer and keeps PRACT as 0.
It reads the correct data checksum which host never computed (but device
did at the time of write).
> The buf mode is just a duplicate implementation of the block layer
> automatic PI. The no buf means PRACT which let's the device auto
> generate and strip PI.
Regardless of buf or no buf, it applies PRACT and only device computes
the checksum. The two modes are taking shape only because of the way
PRACT works for two different device configurations
#1: when meta-size == pi-size, we don't need to send meta-buffer.
#2: when meta-size > pi-size, we need to.
Automatic PI helps for #2 as split handling of meta-buffer comes free if
I/O is split. But overall, this is also about abstracting PRACT details
so that each filesystem does not have to bother.
And changes to keep this abstracted in Auto-PI/NVMe are not much:
block/bio-integrity.c | 42 ++++++++++++++++++++++++++++++++++++++-
block/t10-pi.c | 7 +++++++
drivers/nvme/host/core.c | 24 ++++++++++++++++++++++
drivers/nvme/host/nvme.h | 1 +
> Especially the latter one (which is the
> one that was benchmarked) literally provides no additional protection
> over what the device would already do. It's the "trust me, bro" of
> data integrity :) Which to be fair will work pretty well as devices
> that support PI are the creme de la creme of storage devices and
> will have very good internal data protection internally. But the
> point of data checksums is to not trust the storage device and
> not trust layers between the checksum generation and the storage
> device.
Right, I'm not saying that protection is getting better. Just that any
offload is about trusting someone else with the job. We have other
instances like atomic-writes, copy, write-zeroes, write-same etc.
> IFF using PRACT is an acceptable level of protection just running
> NODATASUM and disabling PI generation/verification in the block
> layer using the current sysfs attributes (or an in-kernel interface
> for that) to force the driver to set PRACT will do exactly the same
> thing.
I had considered but that can't work because:
- the sysfs attributes operate at block-device level for all read or all
write operations. That's not flexible for policies such "do something
for some writes/reads but not for others" which can translate to "do
checksum offload for FS data, but keep things as is for FS meta" or
other combinations.
- If the I/O goes down to driver with , driver will start failing
(rather than setting PRACT) if the configuration is "meta-size >
pi-size". This part in nvme_setup_rw:
if (!blk_integrity_rq(req)) {
if (WARN_ON_ONCE(!nvme_ns_has_pi(ns->head)))
return BLK_STS_NOTSUPP;
control |= NVME_RW_PRINFO_PRACT;
}
More information about the Linux-nvme
mailing list