[PATCH 3/3] nvme-multipath: add "use_nonoptimized" module option

John Meneghini jmeneghi at redhat.com
Wed Sep 27 06:11:51 PDT 2023


Ewan I discussed this patch and agreed this is not something we want to go upstream. It was only included here for completeness. 
This patch was only used to increase the number of active paths during the testing of patches 01 and 02.  The test bed Ewan used 
originally had only 4 35Gbps nvme-tcp contollers (2 active optimized and 2 active non-optimized).  He used this patch to change 
the multi-pathing policy and enable the use all 4 controllers - resulting in 4 active paths.

Those test results can be seen here:

https://people.redhat.com/jmeneghi/.multipath/test1/A400-TCP-FIO-RR.ps - round-robin io tests
https://people.redhat.com/jmeneghi/.multipath/test1/A400-TCP-FIO-QD.ps - queue-depth io tests

After sending these patches upstream on Monday Ewan and I built a new test bed with 8 controllers - 4 active optimize paths and 
4 non-optimized paths.  This provided a true multi-path test bed and the use_nonoptimized patch wasn't needed.

In addition, this test bed has a mixed controller subsystem consisting of 4 32GB nvme-fc controllers and 4 100Gbps nvme-tcp 
controllers.  This provided the optimal mix of active optimized controllers paths with different transports that have inherently 
different latency characteristics.

  [root at rhel-storage-104 ~]# nvme list-subsys
nvme-subsys3 - NQN=nqn.1992-08.com.netapp:sn.2b82d9b13bb211ee8744d039ea989119:subsystem.SS104a
\
  +- nvme10 fc traddr=nn-0x2027d039ea98949e:pn-0x202cd039ea98949e,host_traddr=nn-0x200000109b9b7f0d:pn-0x100000109b9b7f0d live
  +- nvme11 fc traddr=nn-0x2027d039ea98949e:pn-0x2029d039ea98949e,host_traddr=nn-0x200000109b9b7f0c:pn-0x100000109b9b7f0c live
  +- nvme12 fc traddr=nn-0x2027d039ea98949e:pn-0x2028d039ea98949e,host_traddr=nn-0x200000109b9b7f0d:pn-0x100000109b9b7f0d live
  +- nvme13 tcp traddr=172.18.50.13,trsvcid=4420,src_addr=172.18.50.3 live
  +- nvme2 tcp traddr=172.18.60.16,trsvcid=4420,src_addr=172.18.60.4 live
  +- nvme3 fc traddr=nn-0x2027d039ea98949e:pn-0x202dd039ea98949e,host_traddr=nn-0x200000109b9b7f0c:pn-0x100000109b9b7f0c live
  +- nvme4 tcp traddr=172.18.50.15,trsvcid=4420,src_addr=172.18.50.3 live
  +- nvme9 tcp traddr=172.18.60.14,trsvcid=4420,src_addr=172.18.60.4 live

I shared the performance graphs for this testbed using these patches at the ALPSS conference today.

Graphs of inflight I/O on 4 Optimized paths (2 FC, 2 TCP) with round-robin:

https://people.redhat.com/jmeneghi/.multipath/test2/A400-TEST1-FIO-RR.ps

and queue-depth:

https://people.redhat.com/jmeneghi/.multipath/test2/A400-TEST1-FIO-QD.ps

Also a included is a graph showing the combined number of inflight I/Os on all paths, plotted with both RR and QD:

https://people.redhat.com/jmeneghi/.multipath/test2/A400-TEST1-FIO-MAX.ps

What we see in these graphs is that RR is only using 1 of the 4 possible paths, and with QD all 4 paths are used about equally 
while the maximum inflight I/Os count almost doubles.

Hope this helps.

John A. Meneghini
Senior Principal Platform Storage Engineer
RHEL SST - Platform Storage Group
jmeneghi at redhat.com

On 9/27/23 13:31, Sagi Grimberg wrote:
> 
>>> Setting nvme_core.use_nonoptimized=true will cause the path
>>> selector to treat optimized and nonoptimized paths equally.
>>>
>>> This is because although an NVMe fabrics target device may report
>>> an unoptimized ANA state, it is possible that other factors such
>>> as fabric latency are a large factor in the I/O service time.  And,
>>> throughput may improve overall if nonoptimized ports are also used.
>>>
>>> Signed-off-by: Ewan D. Milne <emilne at redhat.com>
>>> ---
>>>   drivers/nvme/host/multipath.c | 22 +++++++++++++++++++---
>>>   1 file changed, 19 insertions(+), 3 deletions(-)
>>>
>> No. Please don't.
>>
>> There's a reason why controllers specify paths as 'active/optimized' or 'active/non-optimized'. If they had wanted us to use 
>> all paths they would have put them into the same group.
>> They tend to get very unhappy if you start using them at the same time.
>> (Triggering failover etc.)
> 
> I have to agree here. This is effectively a modparam that says
> all paths are optimized regardless of what the controller reports.
> 
> While I do acknowledge that there may be some merit to use non-optimized
> paths as well, but its almost impossible to know some latent optimum
> path distribution. Hence the host forfeits even attempting.
> 
> If the controller wants all path used, it should make all paths
> optimized and the host can examine QD accumulating on some paths
> vs others.
> 




More information about the Linux-nvme mailing list