Seeking for help with NVMe arbitration questions

Tue Apr 25 17:26:50 PDT 2023

Hi experts,

I'm trying to evaluate how NVMe arbitration would impact the overall
IO performance but have been suffering from finding the correct
materials. Thus, I'm reaching out for your expertise help. Could you
please comment on my following questions? I'm a newbie to this area,
so some of the questions might be beyond this mail-list's scope. But
any help is much appreciated!

1. To be more specific, I'm trying to enable WRR for accelerating a
specific set of IOs. So first, I'm figuring out in which layer the
arbitration works. From my understanding, today the NVMe block layer
adopts a multi-queue design to leverage the high density of CPU cores.
But I'm confused about if the WRR works in software queues or the
hardware queues. I suppose it's among hardware queues. Otherwise, it
brings synchronization problems to the software queues, which seems to
be against the design intention of parallelizing IO submissions. Am I
getting it right?

2. I've also learnt that the submission queues can be further
classified as default/read/poll. I did some experiments by issuing
different IOs to different queues (intense large writes->default
queues, sparse small writes->poll queues), aiming to prioritize small
writes over large writes. However the performance didn't vary no
matter how I distributed the queues. Is it because the node has far
more submission queues (64 cores and up to 128 queues) than the IO
jobs (8 jobs for large writes, and 1 job for small writes), so that
having separate queues for small IOs won't help? In other words, does
it mean the queue type has nothing to do with prioritization (or
arbitration)?

3. I came across this mail thread: https://lwn.net/Articles/810726/,
where Weiping was trying to add WRR support in the kernel. But it
seems the patch was eventually dropped. Does it mean today the
linux-nvme kernel doesn't have WRR support?

4. If the answer to question 3 is no, then what does that mean to the
application layer? Does it become a pure device layer stuff?

Many thanks in advance!

Best,
Yicheng