[RFC PATCHv4 2/6] nvme-multipath: add support for adaptive I/O policy
Hannes Reinecke
hare at suse.de
Tue Nov 4 06:57:24 PST 2025
On 11/4/25 11:45, Nilay Shroff wrote:
> This commit introduces a new I/O policy named "adaptive". Users can
> configure it by writing "adaptive" to "/sys/class/nvme-subsystem/nvme-
> subsystemX/iopolicy"
>
> The adaptive policy dynamically distributes I/O based on measured
> completion latency. The main idea is to calculate latency for each path,
> derive a weight, and then proportionally forward I/O according to those
> weights.
>
> To ensure scalability, path latency is measured per-CPU. Each CPU
> maintains its own statistics, and I/O forwarding uses these per-CPU
> values. Every ~15 seconds, a simple average latency of per-CPU batched
> samples are computed and fed into an Exponentially Weighted Moving
> Average (EWMA):
>
> avg_latency = div_u64(batch, batch_count);
> new_ewma_latency = (prev_ewma_latency * (WEIGHT-1) + avg_latency)/WEIGHT
>
> With WEIGHT = 8, this assigns 7/8 (~87.5%) weight to the previous
> latency value and 1/8 (~12.5%) to the most recent latency. This
> smoothing reduces jitter, adapts quickly to changing conditions,
> avoids storing historical samples, and works well for both low and
> high I/O rates. Path weights are then derived from the smoothed (EWMA)
> latency as follows (example with two paths A and B):
>
> path_A_score = NSEC_PER_SEC / path_A_ewma_latency
> path_B_score = NSEC_PER_SEC / path_B_ewma_latency
> total_score = path_A_score + path_B_score
>
> path_A_weight = (path_A_score * 100) / total_score
> path_B_weight = (path_B_score * 100) / total_score
>
> where:
> - path_X_ewma_latency is the smoothed latency of a path in nanoseconds
> - NSEC_PER_SEC is used as a scaling factor since valid latencies
> are < 1 second
> - weights are normalized to a 0–64 scale across all paths.
>
> Path credits are refilled based on this weight, with one credit
> consumed per I/O. When all credits are consumed, the credits are
> refilled again based on the current weight. This ensures that I/O is
> distributed across paths proportionally to their calculated weight.
>
> Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
> ---
> drivers/nvme/host/core.c | 15 +-
> drivers/nvme/host/ioctl.c | 31 ++-
> drivers/nvme/host/multipath.c | 425 ++++++++++++++++++++++++++++++++--
> drivers/nvme/host/nvme.h | 74 +++++-
> drivers/nvme/host/pr.c | 6 +-
> drivers/nvme/host/sysfs.c | 2 +-
> 6 files changed, 530 insertions(+), 23 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare at suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list