[PATCH 03/11] block: remove the BIP_IP_CHECKSUM flag

Martin K. Petersen martin.petersen at oracle.com
Wed Jun 12 10:27:47 PDT 2024


Christoph,

>> > Note that unlike the NOCHECK flag which I just cleaned up because they
>> > were unused, this one actually does get in the way of the architecture
>> > of the whole series :( We could add a per-bip csum_type but it would
>> > feel really weird.
>> 
>> Why would it feel weird? That's how it currently works.
>
> Because there's no way to have it set to anything but the per-queue
> one.

That's what the io_uring passthrough changes enable.

Note that the IP checksum is an optional performance feature. A SCSI
controller supporting IP-to-CRC conversion does not imply that all
submitted metadata must use IP checksum format.

The T10 CRC used to be painfully slow to calculate prior to processors
growing support for pclmulqdq or similar. Hence the optional IP
checksum. But on a modern CPU, the T10 CRC can often be calculated fast
enough that it is less of a performance impediment.

The interface was explicitly designed so that the entity which creates
the metadata decides which checksum it wants to use. And then it uses
the bip flag to communicate that to the HBA. The patch which allowed the
user to set the desired guard tag format for block layer-owned PI fell
by the wayside, apparently. Possibly lost track because the T10 CRC
hardware offload changes took a while to land.

Note that I would personally love to get rid of the IP checksum
altogether but I think it's too soon to make it obsolete. Still a lot of
SCSI stuff out there which runs in IP checksum mode. And it is still a
bit faster than CRC for many workloads. And as long as it is in use, we
need the ability to support it and qualify it.

All I'm asking is that we retain the ability to disable checking at the
controller level and at the target level. And that the optional IP
checksum can be selected on a per-I/O basis. IOW, please just retain the
three existing bip flags. Happy to look into what polarity-reversal
would look like but I don't think that should hold up your series.

>> The qualification tool issues a flurry of commands injecting errors at
>> various places in the stack to identify that the right entity (block
>> layer, controller, storage device) catch a bad checksum, reference tag,
>> etc.
>
> How does it do that?  There's no actualy way to make it mismatch.

Through a custom passthrough driver that we want to get rid of and
replace with the io_uring interface series.

-- 
Martin K. Petersen	Oracle Linux Engineering



More information about the Linux-nvme mailing list