Seeking for help with NVMe arbitration questions

Thu Apr 27 13:45:15 PDT 2023

On Thu, Apr 27, 2023 at 12:03:05PM -0700, Wang Yicheng wrote:
> Thanks Chaitanya for confirming!
> 
> Given that the kernel module doesn't support configuring device-level
> arbitration,

You can't configure the arbitration method. The only thing you can
change is the arbitration burst size.

> I'm now trying to leverage the I/O queues to achieve the
> same goal. If you could kindly help with the following questions it
> will be much appreciated!
> 
> 1. Is there a way to query the type of each I/O queue and which CPU it
> resides on?

The type of I/O queue selected depends on what you submit. If you
have read queues configured, then reads go on the read queues. If
you have poll queues, hipri io goes on those. Everything else goes
on default.

> 2. Is there a way to control which queue a submitted I/O goes into?

The queues are assigned to specific CPUs. If you want a specific
queue to handle your command, then submit your request from one of
the CPUs that map to the queue.

If you want to know which queues map to which CPUs, consult sysfs,
example: /sys/block/nvme0n1/mq/.

That will show each queue as a unique number (not necessarily aligned
to the nvme sq/cq id's), and each number will have a cpu_list that
tells you which CPUs can dispatch to that queue. sysfs doesn't tell
you the type (read/write/poll), but all queues are accounted for,
and each type's queues will cover all cpus.

> 3. Say I have only 1 default queue and I submit an I/O from some CPU,
> then there can be a chance that the I/O would need to cross CPUs, if
> the default queue happens not to be on the same core right?

If you only have one queue of a particular type, then the sysfs mq
directory for that queue should show cpu_list having all CPU's set,
so no CPU crossing necessary for dispatch. In fact, for any queue
count and CPU topo, dispatch should never need to reschedule to
another core (that was the point of the design). Completions on the
other hand are typically affinitized to a single specific CPU, so
the complete may happen on a different core than your submit in
this scenario.