[RFC PATCH 0/3] nvme sq associations

Fri Sep 24 20:02:07 PDT 2021

On Fri, Sep 24, 2021 at 09:08:06PM +0000, Andrey Nikitin wrote:
> The NVMe specification allows for namespaces with different performance
> characteristics, as well as allowing IOs submissions to any namespace via
> any non-empty submission queue. However, sharing queue resources between
> namespaces with different performance characteristics can cause undesired
> behavior (e.g. head-of-line-blocking for IOs that target a high-performance
> namespace behind IOs that target a low performance namespace via the same
> queue). In addition, the lack of hardware queue isolation support can cause
> “noisy neighbor” type problems for applications issuing IOs to different
> namespaces of the same controller. This problem may be especially pronounced
> in multi-tenant environments such as the ones provided by cloud services.
> 
> The NVMe 1.4 specification has introduced some optional features (NVM sets
> and SQ associations) that can be utilized to improve this situation provided
> these features are supported by both controllers and host drivers. Namespaces
> can be assigned to NVM sets (by performance characteristics, for example)
> which each NVM set having its own set of associated queues.
> 
> This patch series proposes a simple implementation of NVM sets and
> SQ associations for the NVMe host PCI module.  A controller that supports
> these features, along with a sufficient number of queue pairs (at least
> one per NVM set), will have the available queue pairs associated uniformly
> across each NVM set. IO requests directed at the controller will honor
> the namespace/NVM set/queue association by virtue of each NVM set having
> its own blk-mq tagset associated with it.

Different submission queue groups per NVM Set sounds right for this
feature, but I'm not sure it makes sense for these to have their own
completion queues: completions from different sets would try to schedule
on the same CPU. I think it should be more efficient to break the 1:1
SQ:CQ pairing, and instead have all the SQs with the same CPU affinity
share a single CQ so that completions from different namespaces could be
handled in a single interrupt.