Seeking for help with NVMe arbitration questions

Thu Apr 27 17:36:12 PDT 2023

Thanks a lot Keith! This is very helpful!

1. Then do you see a way to prioritize a specific set of IOs (favor
small writes over large writes) from the IO queue's perspective?
Initially I was thinking of WRR, which later turned out to be not
supported. If I want to leverage the IO queues to achieve the same
goal, from what I understand I can simply send small writes to poll
queues, and allocate more of those queues. Say on average small writes
take up 20% of the total IOs. And if I distribute 40% of total queues
as poll queues, in some sense I give more weight to small writes and
thus prioritize them.

> > 3. Say I have only 1 default queue and I submit an I/O from some CPU,
> > then there can be a chance that the I/O would need to cross CPUs, if
> > the default queue happens not to be on the same core right?
>
> If you only have one queue of a particular type, then the sysfs mq
> directory for that queue should show cpu_list having all CPU's set,
> so no CPU crossing necessary for dispatch. In fact, for any queue
> count and CPU topo, dispatch should never need to reschedule to
> another core (that was the point of the design). Completions on the
> other hand are typically affinitized to a single specific CPU, so
> the complete may happen on a different core than your submit in
> this scenario.

2. You mentioned that completions are affinitized to a single specific
CPU. And this is exactly what I observed in my test. This also seems
to cause worse performance. Is there a way to query that affinity or
is it invisible from outside?

Best,
Yicheng