[PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation

Keith Busch kbusch at kernel.org
Tue May 25 17:39:00 PDT 2021


On Wed, May 26, 2021 at 01:50:29AM +0300, Max Gurtovoy wrote:
> 
> On 5/19/2021 8:43 PM, Sagi Grimberg wrote:
> > We cannot detect a (perhaps buggy) controller that is sending us
> > a completion for a request that was already completed (for example
> > sending a completion twice), this phenomenon was seen in the wild
> > a few times.
> 
> The bad controller should be fixed.
> 
> In the past, I've sent patches that check that sqid match in nvme cqe to
> protect faulty drives that might send
> the completion on a wrong msix.
> 
> My patch wasn't accepted since it added an additional "if" in the fast path.
> 
> Now we're adding much more operation in the fast path because of buggy ctrl
> ?

I shared the same performance concern on v1 on this series. I haven't
been able to test this one yet (only have emulation for two more weeks).

Hannes says the bug this catches happens frequently enough on TCP. If we
don't catch it, we get kernel panic or corruption, so we defintely need to
do something. Sagi correctly noted this type of bug is not unique to TCP
(or even NVMe, for that matter), but if there is a performance impact on
PCI, and no one so far reports such an issue, I would still recommend
this type of mitigation be isolated to transports that actually observe
invalid CQEs.
 
> > So to protect against this, we use the upper 4 msbits of the nvme sqe
> > command_id to use as a 4-bit generation counter and verify it matches
> > the existing request generation that is incrementing on every execution.
> > 
> > The 16-bit command_id structure now is constructed by:
> > | xxxx | xxxxxxxxxxxx |
> >    gen    request tag
> 
> Is this new specification for command_id ?

The host can put whatever it wants in this field. It's an opaque value
as far as the controller is concerned.



More information about the Linux-nvme mailing list