[RFC PATCHv5 0/7] nvme-multipath: introduce adaptive I/O policy
Sagi Grimberg
sagi at grimberg.me
Fri Dec 12 04:08:26 PST 2025
On 05/11/2025 12:33, Nilay Shroff wrote:
> Hi,
>
> This series introduces a new adaptive I/O policy for NVMe native
> multipath. Existing policies such as numa, round-robin, and queue-depth
> are static and do not adapt to real-time transport performance.
It can be argued that queue-depth is a proxy of latency.
> The numa
> selects the path closest to the NUMA node of the current CPU, optimizing
> memory and path locality, but ignores actual path performance. The
> round-robin distributes I/O evenly across all paths, providing fairness
> but not performance awareness. The queue-depth reacts to instantaneous
> queue occupancy, avoiding heavily loaded paths, but does not account for
> actual latency, throughput, or link speed.
>
> The new adaptive policy addresses these gaps selecting paths dynamically
> based on measured I/O latency for both PCIe and fabrics.
Adaptive is not a good name. Maybe weighted-latency of wplat (weighted
path latency)
or something like that.
> Latency is
> derived by passively sampling I/O completions. Each path is assigned a
> weight proportional to its latency score, and I/Os are then forwarded
> accordingly. As condition changes (e.g. latency spikes, bandwidth
> differences), path weights are updated, automatically steering traffic
> toward better-performing paths.
>
> Early results show reduced tail latency under mixed workloads and
> improved throughput by exploiting higher-speed links more effectively.
> For example, with NVMf/TCP using two paths (one throttled with ~30 ms
> delay), fio results with random read/write/rw workloads (direct I/O)
> showed:
>
> numa round-robin queue-depth adaptive
> ----------- ----------- ----------- ---------
> READ: 50.0 MiB/s 105 MiB/s 230 MiB/s 350 MiB/s
> WRITE: 65.9 MiB/s 125 MiB/s 385 MiB/s 446 MiB/s
> RW: R:30.6 MiB/s R:56.5 MiB/s R:122 MiB/s R:175 MiB/s
> W:30.7 MiB/s W:56.5 MiB/s W:122 MiB/s W:175 MiB/s
Seems like a nice gain.
Can you please test for the normal symmetric paths case? Would like
to see the trade-off...
More information about the Linux-nvme
mailing list