[PATCH for-next 4/4] nvme-multipath: add multipathing for uring-passthrough commands

Hannes Reinecke hare at suse.de
Wed Jul 13 06:30:28 PDT 2022


On 7/13/22 14:43, Sagi Grimberg wrote:
> 
> 
> On 7/13/22 14:49, Hannes Reinecke wrote:
>> On 7/13/22 13:00, Sagi Grimberg wrote:
>>>
>>>>> Maybe the solution is to just not expose a /dev/ng for the mpath 
>>>>> device
>>>>> node, but only for bottom namespaces. Then it would be completely
>>>>> equivalent to scsi-generic devices.
>>>>>
>>>>> It just creates an unexpected mix of semantics of best-effort
>>>>> multipathing with just path selection, but no requeue/failover...
>>>>
>>>> Which is exactly the same semanics as SG_IO on the dm-mpath nodes.
>>>
>>> I view uring passthru somewhat as a different thing than sending SG_IO
>>> ioctls to dm-mpath. But it can be argued otherwise.
>>>
>>> BTW, the only consumer of it that I'm aware of commented that he
>>> expects dm-mpath to retry SG_IO when dm-mpath retry for SG_IO submission
>>> was attempted (https://www.spinics.net/lists/dm-devel/msg46924.html).
>>>
>>>  From Paolo:
>>> "The problem is that userspace does not have a way to direct the 
>>> command to a different path in the resubmission. It may not even have 
>>> permission to issue DM_TABLE_STATUS, or to access the /dev nodes for 
>>> the underlying paths, so without Martin's patches SG_IO on dm-mpath 
>>> is basically unreliable by design."
>>>
>>> I didn't manage to track down any followup after that email though...
>>>
>> I did; 'twas me who was involved in the initial customer issue leading 
>> up to that.
>>
>> Amongst all the other issue we've found the prime problem with SG_IO 
>> is that it needs to be directed to the 'active' path.
>> For the device-mapper has a distinct callout (dm_prepare_ioctl), which 
>> essentially returns the current active path device. And then the 
>> device-mapper core issues the command on that active path.
>>
>> All nice and good, _unless_ that command triggers an error.
>> Normally it'd be intercepted by the dm-multipath end_io handler, and 
>> would set the path to offline.
>> But as ioctls do not use the normal I/O path the end_io handler is 
>> never called, and further SG_IO calls are happily routed down the 
>> failed path.
>>
>> And the customer had to use SG_IO (or, in qemu-speak, LUN passthrough) 
>> as his application/filesystem makes heavy use of persistent reservations.
> 
> How did this conclude Hannes?

It didn't. The proposed interface got rejected, and now we need to come 
up with an alternative solution.
Which we haven't found yet.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare at suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer



More information about the Linux-nvme mailing list