Seeking for help with NVMe arbitration questions

Keith Busch kbusch at kernel.org
Fri Apr 28 08:43:30 PDT 2023


On Thu, Apr 27, 2023 at 05:36:12PM -0700, Wang Yicheng wrote:
> Thanks a lot Keith! This is very helpful!
> 
> 1. Then do you see a way to prioritize a specific set of IOs (favor
> small writes over large writes) from the IO queue's perspective?
> Initially I was thinking of WRR, which later turned out to be not
> supported. If I want to leverage the IO queues to achieve the same
> goal, from what I understand I can simply send small writes to poll
> queues, and allocate more of those queues. Say on average small writes
> take up 20% of the total IOs. And if I distribute 40% of total queues
> as poll queues, in some sense I give more weight to small writes and
> thus prioritize them.

You might expect that a new command placed on a shallow queue will
be handled ahead of a command place on a deep queue at the same
time. Indeed, some implementation may even show desirable results
with that scheme, but the spec doesn't really guarantee that, though.

For a pure software side solution, you could use an ioscheduler and
set your ioprio accordingly.
 
> > > 3. Say I have only 1 default queue and I submit an I/O from some CPU,
> > > then there can be a chance that the I/O would need to cross CPUs, if
> > > the default queue happens not to be on the same core right?
> >
> > If you only have one queue of a particular type, then the sysfs mq
> > directory for that queue should show cpu_list having all CPU's set,
> > so no CPU crossing necessary for dispatch. In fact, for any queue
> > count and CPU topo, dispatch should never need to reschedule to
> > another core (that was the point of the design). Completions on the
> > other hand are typically affinitized to a single specific CPU, so
> > the complete may happen on a different core than your submit in
> > this scenario.
> 
> 2. You mentioned that completions are affinitized to a single specific
> CPU. And this is exactly what I observed in my test. This also seems
> to cause worse performance. Is there a way to query that affinity or
> is it invisible from outside?

To query a queue's affinity, check /proc/irq/<#>/effective_affinity.
You can check /proc/interrupts to determine which irq# goes with
which queue.



More information about the Linux-nvme mailing list