[PATCH v8 8/8] nvme-multipath: queue-depth support for marginal paths

John Meneghini jmeneghi at redhat.com
Fri Jul 11 07:53:23 PDT 2025


On 7/10/25 10:59 PM, Muneendra Kumar wrote:
> Correct me if iam wrong.
>>> . In the case where
>>> all paths are marginal and no optimized or non-optimized path is
>>> found, we fall back to __nvme_find_path which selects the best marginal path
> 
> With the current patch __nvme_find_path will allways picks the path from non-optimized path ?

Not necessarily. I think it all comes down the this code:

                switch (ns->ana_state) {
                 case NVME_ANA_OPTIMIZED:
                         if (!nvme_ctrl_is_marginal(ns->ctrl)) {
                                 if (distance < found_distance) {
                                         found_distance = distance;
                                         found = ns;
                                 }
                                 break;
                         }
                         fallthrough;
                 case NVME_ANA_NONOPTIMIZED:
                         if (distance < fallback_distance) {
                                 fallback_distance = distance;
                                 fallback = ns;
                         }
                         break;

Any NVME_ANA_OPTIMIZED path that is marginal becomes a part of the fallback ns algorithm.

In the case where there is at least one NVME_ANA_OPTIMIZED path, it works correctly.  You will always find the NVME_ANA_OPTIMIZED
path.  In the case there there are no NVME_ANA_OPTIMIZED paths it turns in to kind of a crap shoot. You end up with the first fallback
ns that's found.  That could be an NVME_ANA_OPTIMIZED path or an NVME_ANA_NONOPTIMIZED path.  It all depends upon how the head->list is
sorted and if there are any disabled paths.

In our testing I've seen that this sometimes selects the NVME_ANA_OPTIMIZED path and sometimes the NVME_ANA_NONOPTIMIZED path.

In the simple test case, when the first two paths are optimized, and only one is marginal, this algorithm always selects the NVME_ANA_NONOPTIMIZED path.

It's only the more complicated test when all NVME_ANA_NONOPTIMIZED paths are marginal that I see some unpredictability.

/John





More information about the Linux-nvme mailing list