[PATCH 3/5] nvme: add support for copy offload

Caleb Sander Mateos csander at purestorage.com
Wed May 21 20:41:40 PDT 2025


On Wed, May 21, 2025 at 8:23 PM Keith Busch <kbusch at kernel.org> wrote:
>
> On Wed, May 21, 2025 at 05:51:03PM -0700, Caleb Sander Mateos wrote:
> > On Wed, May 21, 2025 at 5:47 PM Caleb Sander Mateos
> > >
> > > alloc_size should be sizeof(*range) * i? Otherwise this exceeds the
> > > amount of data used by the Copy command, which not all controllers
> > > support (see bit LLDTS of SGLS in the Identify Controller data
> > > structure). We have seen the same behavior with Dataset Management
> > > (always specifying 4 KB of data), which also passes the maximum size
> > > of the allocation to bvec_set_virt().
> >
> > I see that was added in commit 530436c45ef2e ("nvme: Discard
> > workaround for non-conformant devices"). I would rather wait for
> > evidence of non-conformant devices supporting Copy before implementing
> > the same spec-noncompliant workaround. It could be a quirk if
> > necessary.
>
> Right, that's exactly why I didn't bother allocating tighter to what the
> command actually needs. The number of devices that would have needed a
> DSM quirk for the full 4k was untenable, so making the quirk behavior
> the default was a sane compromise. I suppose Copy is a more enterprisey
> feature in comparison, so maybe we can count on devices doing dma
> correctly?

For the record, that change broke Linux hosts sending DSM commands to
our NVMe controller, which was validating that the SGL length exactly
matches the number of data bytes implied by the command. I'm sure
we're in the minority of NVMe controller vendors in aggressively
validating the NVMe command parameters, but it was unfortunate to
discover this change in Linux's behavior. We have since relaxed the
validation on the controller, so we wouldn't reject these Copy
commands. But the extra data in the NVMe command is still wasteful for
transports like NVMe/TCP in-capsule data where the entire data is
unconditionally transferred to the controller.
It would be great to start with a presumption of spec compliance. But
by all means, if we find a significant fraction of controller
implementations can't tolerate a sub-page data buffer, we can apply
the workaround from DSM.

Best,
Caleb



More information about the Linux-nvme mailing list