[PATCH 0/3] nvme: protect against possible request reference after completion

Mon May 17 11:47:43 PDT 2021

On Mon, May 17, 2021 at 10:59:52AM -0700, Sagi Grimberg wrote:
> Nothing in nvme protects against referencing a request after it was completed.
> For example, in case a buggy controller sends a completion twice for the same
> request, the host can access and modify a request that was already completed.
> 
> At best, this will cause a panic, but on the worst case, this can cause a silent
> data corruption if the request was already reused and executed by the time
> we reference it.
> 
> The nvme command_id is an opaque that we simply placed the request tag thus far.
> To protect against a access after completion, we introduce a generation counter
> to the upper 4-bits of the command_id that will increment every invocation and
> be validated upon the reception of a completion. This will limit the maximum
> queue depth to be effectively 4095, but we hardly ever use such long queues
> (in fabrics the maximum is already 1024).

This is a neat safe guard even though we haven't seen much indication of
this type of controller bug occurring on PCIe.

It looks pretty light-weight, but I would like to see if this has a
performance impact. I'm still 3 weeks away from physical access to my
site to set up a performance test with my low-latency devices, though.

On patch 2, I think it's safe to just cap the queue depth to the new max
rather than return an error if the user requests more. 4k actually seems
like quite a lot there, too. 1k should be plenty just like the fabrics
transports, and a 1k limit provides 2 more bits for the gen sequence.