[PATCH 3/3] nvme: code command_id with a genctr for use-after-free validation

Sagi Grimberg sagi at grimberg.me
Mon May 17 13:27:33 PDT 2021


>>> We cannot detect a (perhaps buggy) controller that is sending us
>>> a completion for a request that was already completed (for example
>>> sending a completion twice), this phenomenon was seen in the wild
>>> a few times.
>>>
>>> So to protect against this, we use the upper 4 msbits of the nvme sqe
>>> command_id to use as a 4-bit generation counter and verify it matches
>>> the existing request generation that is incrementing on every execution.
>>>
>>> The 16-bit command_id structure now is constructed by:
>>> | xxxx | xxxxxxxxxxxx |
>>>    gen    request tag
>>>
>>> This means that we are giving up some possible queue depth as 12 bits
>>> allow for a maximum queue depth of 4095 instead of 65536, however we
>>> never create such long queues anyways so no real harm done.
>>
>> Is a four bit generation counter sufficient? Shouldn't such a counter be
>> at least 32 bits to provide a reasonable protection against e.g.
>> duplicate packets generated by retry mechanisms in networking stacks?
> 
> It has to fit in the protocol's command identifier, and that's only 16
> bits. Most of the bits for the tag, so the implementation uses the most
> available. More could be better, but that would require a spec change.

Yes, even if we can expand the genctr I don't think we should, we may
want to leave some bits for potential future usages. No matter how
many bits we allocate, we can't protect a 100% against everything here.

>> Additionally, I do not agree with the statement "we never create such
>> long queues anyways". I have already done this myself.
> 
> Why? That won't improve bandwidth, and will increase latency. We already
> have timeout problems with the current default 1k qdepth on some
> devices.

Well, if there is a use-case that requires queues that are deeper than
4095 entries, we have no space in the command to protect against this...

I also find it surprising that there is a real benefit for such deep
nvme queues...



More information about the Linux-nvme mailing list