[RFC PATCH 5/5] nvme-multipath: factor fabric link speed into path score

Nilay Shroff nilay at linux.ibm.com
Sun Sep 21 04:12:25 PDT 2025


If the fabric adapter link speed is known, include it when calculating
the path score for the adaptive I/O policy. Paths with higher link
speed receive proportionally higher scores, while paths with lower link
speed receive lower scores.

For example, in a multipath topology with two paths—one with higher
link speed but higher latency, and another with lower link speed but
lower latency—the scoring formula balances these factors. The result
ensures that path selection does not blindly favor high link speed, but
adjusts scores based on both link speed and latency to achieve
proportional distribution.

The updated path scoring formula is:

    path_X_score = link_speed_X * (NSEC_PER_SEC / path_X_ewma_latency)

where:
  - link_speed_X is the negotiated link speed of the fabric adapter
    (in Mbps),
  - path_X_ewma_latency is the smoothed latency (ns) derived from I/O
    completions,
  - NSEC_PER_SEC is used as a scaling factor.

Weights are then normalized across all paths:

    path_X_weight = (path_X_score * 100) / total_score

This ensures that both lower latency and higher link speed contribute
positively to path selection, while still distributing I/O
proportionally when conditions differ across paths.

Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
---
 drivers/nvme/host/multipath.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index bcceb0fceb94..6ab42350284d 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -246,7 +246,7 @@ static void nvme_mpath_add_sample(struct request *rq, struct nvme_ns *ns)
 	unsigned int rw;
 	struct nvme_path_stat *stat;
 	struct nvme_ns *cur_ns;
-	u32 weight;
+	u32 weight, speed;
 	u64 now, latency, avg_lat_ns;
 	u64 total_score = 0;
 	struct nvme_ns_head *head = ns->head;
@@ -347,14 +347,18 @@ static void nvme_mpath_add_sample(struct request *rq, struct nvme_ns *ns)
 				continue;
 
 			/*
-			 * Compute the path score (inverse of smoothed latency),
-			 * scaled by NSEC_PER_SEC. Floating point math is not
-			 * available in the kernel, so fixed-point scaling is
-			 * used instead. NSEC_PER_SEC is chosen as the scale
-			 * because valid latencies are always < 1 second; and
-			 * we ignore longer latencies.
+			 * Compute the path score as the inverse of smoothed
+			 * latency, scaled by NSEC_PER_SEC. If the device speed
+			 * is known, it is factored in: higher speed increases
+			 * the score, lower speed decreases it. Floating point
+			 * math is unavailable in the kernel, so fixed-point
+			 * scaling is used instead. NSEC_PER_SEC is chosen
+			 * because valid latencies are always < 1 second; longer
+			 * latencies are ignored.
 			 */
-			stat->score = div_u64(NSEC_PER_SEC, stat->slat_ns);
+			speed = cur_ns->speed ? : 1;
+			stat->score = speed * div_u64(NSEC_PER_SEC,
+					stat->slat_ns);
 
 			/* Compute total score. */
 			total_score += stat->score;
-- 
2.51.0




More information about the Linux-nvme mailing list