[PATCH 0/6] block: add support for REQ_OP_VERIFY

Wed Jul 13 02:14:42 PDT 2022

On 7/6/22 10:42, Matthew Wilcox wrote:
> On Thu, Jun 30, 2022 at 02:14:00AM -0700, Chaitanya Kulkarni wrote:
>> This adds support for the REQ_OP_VERIFY. In this version we add
> 
> IMO, VERIFY is a useless command.  The history of storage is full of
> devices which simply lie.  Since there's no way for the host to check if
> the device did any work, cheap devices may simply implement it as a NOOP.

Thanks for sharing your feedback regarding cheap devices.

This falls outside of the scope of the work, as scope of this work is
not to analyze different vendor implementations of the verify command.

> Even expensive devices where there's an ironclad legal contract between
> the vendor and customer may have bugs that result in only some of the
> bytes being VERIFYed.  We shouldn't support it.
This is not true with enterprise SSDs, I've been involved with product
qualification of the high end enterprise SSDs since 2012 including good
old non-nvme devices with e.g. skd driver on linux/windows/vmware.

At product qualification time for large data centers every single
feature gets reviewed with excruciating architectural details in the 
data center environment and detailed analysis of the feature including
running cost and actual impact is calculated where Service level
Agreements are confirmed between the vendor and client. In case vendor 
fails to meet the SLA product gets disqualified.

What you are mentioning is vendor is failing to meet the SLA and I think
we shouldn't consider vendor specific implementations for generic
feature.

> 
> Now, everything you say about its value (not consuming bus bandwidth)
> is true, but the device should provide the host with proof-of-work.

Yes that seems to be missing but it is not a blocker in this work since
protocol needs to provide this information.

We can update the respective specification to add a log page which
shows proof of work for verify command e.g.
A log page consist of the information such as :-

1. How many LBAs were verified ? How long it took.
2. What kind of errors were detected ?
3. How many blocks were moved to safe location ?
4. How much data (LBAs) been moved successfully ?
5. How much data we lost permanently with uncorrectible errors?
6. What is the impact on the overall size of the storage, in
    case of flash reduction in the over provisioning due to
    uncorrectible errors.

but clearly this is outside of the scope of the this work,
if you are willing to work on this I'd be happy to draft a TP
and work with you.

> I'd suggest calculating some kind of checksum, even something like a
> SHA-1 of the contents would be worth having.  It doesn't need to be
> crypto-secure; just something the host can verify the device didn't spoof.

I did not understand exactly what you mean here.

-ck