[RFC PATCHv2 2/3] nvme: introduce multipath_head_always module param

Hannes Reinecke hare at suse.de
Mon Apr 28 22:49:17 PDT 2025


On 4/28/25 09:39, Nilay Shroff wrote:
> 
> 
> On 4/28/25 12:27 PM, Hannes Reinecke wrote:
>> On 4/25/25 12:33, Nilay Shroff wrote:
>>> Currently, a multipath head disk node is not created for single-ported
>>> NVMe adapters or private namespaces. However, creating a head node in
>>> these cases can help transparently handle transient PCIe link failures.
>>> Without a head node, features like delayed removal cannot be leveraged,
>>> making it difficult to tolerate such link failures. To address this,
>>> this commit introduces nvme_core module parameter multipath_head_always.
>>>
>>> When this param is set to true, it forces the creation of a multipath
>>> head node regardless NVMe disk or namespace type. So this option allows
>>> the use of delayed removal of head node functionality even for single-
>>> ported NVMe disks and private namespaces and thus helps transparently
>>> handling transient PCIe link failures.
>>>
>>> By default multipath_head_always is set to false, thus preserving the
>>> existing behavior. Setting it to true enables improved fault tolerance
>>> in PCIe setups. Moreover, please note that enabling this option would
>>> also implicitly enable nvme_core.multipath.
>>>
>>> Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
>>> ---
>>>    drivers/nvme/host/multipath.c | 70 +++++++++++++++++++++++++++++++----
>>>    1 file changed, 63 insertions(+), 7 deletions(-)
>>>
>> I really would model this according to dm-multipath where we have the
>> 'fail_if_no_path' flag.
>> This can be set for PCIe devices to retain the current behaviour
>> (which we need for things like 'md' on top of NVMe) whenever the
>> this flag is set.
>>
> Okay so you meant that when sysfs attribute "delayed_removal_secs"
> under head disk node is _NOT_ configured (or delayed_removal_secs
> is set to zero) we have internal flag "fail_if_no_path" is set to
> true. However in other case when "delayed_removal_secs" is set to
> a non-zero value we set "fail_if_no_path" to false. Is that correct?
> 
Don't make it overly complicated.
'fail_if_no_path' (and the inverse 'queue_if_no_path') can both be
mapped onto delayed_removal_secs; if the value is '0' then the head
disk is immediately removed (the 'fail_if_no_path' case), and if it's
-1 it is never removed (the 'queue_if_no_path' case).

Question, though: How does it interact with the existing 
'ctrl_loss_tmo'? Both describe essentially the same situation...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



More information about the Linux-nvme mailing list