[PATCH 0/6] block: add support for REQ_OP_VERIFY

Chaitanya Kulkarni chaitanyak at nvidia.com
Thu Dec 1 10:12:46 PST 2022


On 7/6/22 10:42, Matthew Wilcox wrote:
> On Thu, Jun 30, 2022 at 02:14:00AM -0700, Chaitanya Kulkarni wrote:
>> This adds support for the REQ_OP_VERIFY. In this version we add
> 
> IMO, VERIFY is a useless command.  The history of storage is full of
> devices which simply lie.  Since there's no way for the host to check if
> the device did any work, cheap devices may simply implement it as a NOOP.

In past few months at various storage conferences I've talked to
different people to address your comment where device being
a liar verify implementation or even implementing NOOP.

With all do respect this is not true.

Verify command for NVMe has significant impact when it comes to doing 
the maintenance work in downtime, that can get in a way when device is 
under heavy workload e.g. social media website being active at peak 
hours or high performance trading hours. In such a scenario controller 
doing the maintenance work can increase the tail latency of the workload
and one can see spikes as SSD starts aging, this is also true for
the file systems and databases when complex queries generating
complex I/O patterns under a heavy I/Os.

I respect your experience but calling every single SSD and enterprise
grade SSD vendor a liar is just not true, as with enterprise SSD
qualification process is extremely detailed oriented and each feature
command gets excruciating design review including performance/watt.

So nobody can get away with a lie.

This patch-series has an option to emulate the verify operation,
if you or someone don't want to trust the device one can simply turn
off the verify command and let the kernel emulate the Verify, that
will also save ton of the bandwidth especially in the case of fabrics
and if it is not there I'll make sure to have in the final version.

> Even expensive devices where there's an ironclad legal contract between
> the vendor and customer may have bugs that result in only some of the
> bytes being VERIFYed.  We shouldn't support it.
> 

One cannot simply write anything without bugs, see how many quirks we
have for the NVMe PCIe SSDs that doesn't stop us supporting features
in kernel if one can't trust the device just add a quirk and emulate.

The bugs doesn't make this feature useless or design of
the verify command unusable. There is a significant reason why it
exists in major device specs based on which data-center workloads
are running, just because there are few cheap vendors not being
authentic we cannot call multiple TWG's decision to add verify
command useless and not add support.

In fact all the file systems should be sending verify command as
a part of scrubbing which XFS only does as far as I know since
lack of interface is preventing the use.

> Now, everything you say about its value (not consuming bus bandwidth)
> is true, but the device should provide the host with proof-of-work.
> I'd suggest calculating some kind of checksum, even something like a
> SHA-1 of the contents would be worth having.  It doesn't need to be
> crypto-secure; just something the host can verify the device didn't spoof.

I'm not sure how SSD vendor will entertain the proof of work
idea since it will open the door for other questions such as discard and
any other commands since one of the blatant answer I got "if you don't
trust don't buy".

I'm absolutely not discarding your concerns and idea of proof of work,
I'm wiling to work with you in the TWG and submit the proposal
offline, but right now there is no support for this with existing
specs.

-ck



More information about the Linux-nvme mailing list