[PATCH v2] nvme-tcp: Fix I/O queue cpu spreading for multiple controllers

Mon Jan 6 19:54:36 PST 2025

On 1/4/25 13:27, Sagi Grimberg wrote:
> Since day-1 we are assigning the queue io_cpu very naively. We always
> base the queue id (controller scope) and assign it its matching cpu
> from the online mask. This works fine when the number of queues match
> the number of cpu cores.
>
> The problem starts when we have less queues than cpu cores. First, we
> should take into account the mq_map and select a cpu within the cpus
> that are assigned to this queue by the mq_map in order to minimize cross
> numa cpu bouncing.
>
> Second, even worse is that we don't take into account multiple
> controllers may have assigned queues to a given cpu. As a result we may
> simply compund more and more queues on the same set of cpus, which is
> suboptimal.
>
> We fix this by introducing global per-cpu counters that tracks the
> number of queues assigned to each cpu, and we select the least used cpu
> based on the mq_map and the per-cpu counters, and assign it as the queue
> io_cpu.
>
> The behavior for a single controller is slightly optimized by selecting
> better cpu candidates by consulting with the mq_map, and multiple
> controllers are spreading queues among cpu cores much better, resulting
> in lower average cpu load, and less likelihood to hit hotspots.
>
> Note that the accounting is not 100% perfect, but we don't need to be,
> we're simply putting our best effort to select the best candidate cpu
> core that we find at any given point.
>
> Another byproduct is that every controller reset/reconnect may change
> the queues io_cpu mapping, based on the current LRU accounting scheme.
>
> Here is the baseline queue io_cpu assignment for 4 controllers, 2 queues
> per controller, and 4 cpus on the host:
> nvme1: queue 0: using cpu 0
> nvme1: queue 1: using cpu 1
> nvme2: queue 0: using cpu 0
> nvme2: queue 1: using cpu 1
> nvme3: queue 0: using cpu 0
> nvme3: queue 1: using cpu 1
> nvme4: queue 0: using cpu 0
> nvme4: queue 1: using cpu 1
>
> And this is the fixed io_cpu assignment:
> nvme1: queue 0: using cpu 0
> nvme1: queue 1: using cpu 2
> nvme2: queue 0: using cpu 1
> nvme2: queue 1: using cpu 3
> nvme3: queue 0: using cpu 0
> nvme3: queue 1: using cpu 2
> nvme4: queue 0: using cpu 1
> nvme4: queue 1: using cpu 3
>
> Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
> Suggested-by: Hannes Reinecke<hare at kernel.org>
> Signed-off-by: Sagi Grimberg<sagi at grimberg.me>

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch at nvidia.com>

-ck