[RFC PATCHv5 2/7] nvme-multipath: add support for adaptive I/O policy

Sat Dec 27 01:37:40 PST 2025

>> Can you please run benchmarks with `blocksize_range`/`bssplit`/`cpuload`/`cpuchunks`/`cpumode` ?
> Okay, so I ran the benchmark using bssplit, cpuload, and cpumode. Below is the job
> file I used for the test, followed by the observed throughput result for reference.
>
> Job file:
> =========
>
> [global]
> time_based
> runtime=120
> group_reporting=1
>
> [cpu]
> ioengine=cpuio
> cpuload=85
> cpumode=qsort
> numjobs=32
>
> [disk]
> ioengine=io_uring
> filename=/dev/nvme1n2
> rw=<randread/randwrite/randrw>
> bssplit=4k/10:32k/10:64k/10:128k/30:256k/10:512k/30
> iodepth=32
> numjobs=32
> direct=1
>
> Throughput:
> ===========
>
>           numa          round-robin   queue-depth    adaptive
>           -----------   -----------   -----------    ---------
> READ:    1120 MiB/s    2241 MiB/s    2233 MiB/s     2215 MiB/s
> WRITE:   1107 MiB/s    1875 MiB/s    1847 MiB/s     1892 MiB/s
> RW:      R:1001 MiB/s  R:1047 MiB/s  R:1086 MiB/s   R:1112 MiB/s
>           W:999  MiB/s  W:1045 MiB/s  W:1084 MiB/s   W:1111 MiB/s
>
> When comparing the results, I did not observe a significant throughput
> difference between the queue-depth, round-robin, and adaptive policies.
> With random I/O of mixed sizes, the adaptive policy appears to average
> out the varying latency values and distribute I/O reasonably evenly
> across the active paths (assuming symmetric paths).
>
> Next I'd implement I/O size buckets and also per-numa node weight and
> then rerun tests and share the result. Lets see if these changes help
> further improve the throughput number for adaptive policy. We may then
> again review the results and discuss further.
>
> Thanks,
> --Nilay

two comments:
1. I'd make reads split slightly biased towards small block sizes, and 
writes biased towards larger block sizes
2. I'd also suggest to measure having weights calculation averaged out 
on all numa-node cores and then set percpu (such that
the datapath does not introduce serialization).