[PATCH 3/3] nvme: code command_id with a genctr for use-after-free validation

Mon May 17 12:46:13 PDT 2021

On Mon, May 17, 2021 at 12:09:46PM -0700, Bart Van Assche wrote:
> On 5/17/21 10:59 AM, Sagi Grimberg wrote:
> > We cannot detect a (perhaps buggy) controller that is sending us
> > a completion for a request that was already completed (for example
> > sending a completion twice), this phenomenon was seen in the wild
> > a few times.
> > 
> > So to protect against this, we use the upper 4 msbits of the nvme sqe
> > command_id to use as a 4-bit generation counter and verify it matches
> > the existing request generation that is incrementing on every execution.
> > 
> > The 16-bit command_id structure now is constructed by:
> > | xxxx | xxxxxxxxxxxx |
> >   gen    request tag
> > 
> > This means that we are giving up some possible queue depth as 12 bits
> > allow for a maximum queue depth of 4095 instead of 65536, however we
> > never create such long queues anyways so no real harm done.
> 
> Is a four bit generation counter sufficient? Shouldn't such a counter be
> at least 32 bits to provide a reasonable protection against e.g.
> duplicate packets generated by retry mechanisms in networking stacks?

It has to fit in the protocol's command identifier, and that's only 16
bits. Most of the bits for the tag, so the implementation uses the most
available. More could be better, but that would require a spec change.

> Additionally, I do not agree with the statement "we never create such
> long queues anyways". I have already done this myself.

Why? That won't improve bandwidth, and will increase latency. We already
have timeout problems with the current default 1k qdepth on some
devices.