nvme multipath support V4
Guan Junxiong
guanjunxiong at huawei.com
Sun Oct 22 19:08:52 PDT 2017
Hi Christoph,
On 2017/10/19 0:52, Christoph Hellwig wrote:
> Hi all,
>
> this series adds support for multipathing, that is accessing nvme
> namespaces through multiple controllers to the nvme core driver.
>
> It is a very thin and efficient implementation that relies on
> close cooperation with other bits of the nvme driver, and few small
> and simple block helpers.
>
> Compared to dm-multipath the important differences are how management
> of the paths is done, and how the I/O path works.
>
> Management of the paths is fully integrated into the nvme driver,
> for each newly found nvme controller we check if there are other
> controllers that refer to the same subsystem, and if so we link them
> up in the nvme driver. Then for each namespace found we check if
> the namespace id and identifiers match to check if we have multiple
> controllers that refer to the same namespaces. For now path
> availability is based entirely on the controller status, which at
> least for fabrics will be continuously updated based on the mandatory
> keep alive timer. Once the Asynchronous Namespace Access (ANA)
> proposal passes in NVMe we will also get per-namespace states in
> addition to that, but for now any details of that remain confidential
> to NVMe members.
>
> The I/O path is very different from the existing multipath drivers,
> which is enabled by the fact that NVMe (unlike SCSI) does not support
> partial completions - a controller will either complete a whole
> command or not, but never only complete parts of it. Because of that
> there is no need to clone bios or requests - the I/O path simply
> redirects the I/O to a suitable path. For successful commands
> multipath is not in the completion stack at all. For failed commands
> we decide if the error could be a path failure, and if yes remove
> the bios from the request structure and requeue them before completing
> the request. All together this means there is no performance
> degradation compared to normal nvme operation when using the multipath
> device node (at least not until I find a dual ported DRAM backed
> device :))
>
> A git tree is available at:
>
> git://git.infradead.org/users/hch/block.git nvme-mpath
>
> gitweb:
>
> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme-mpath
>
> Changes since V3:
> - new block layer support for hidden gendisks
> - a couple new patches to refactor device handling before the
> actual multipath support
> - don't expose per-controller block device nodes
> - use /dev/nvmeXnZ as the device nodes for the whole subsystem.
If per-controller block device nodes are hidden, how can the user-space tools
such as multipath-tools and nvme-cli (if it supports) know status of each path of
the multipath device?
In some cases, the admin wants to know which path is in down state , in degraded
state such as suffering intermittent IO error because of shaky link and he can fix
the link or isolate such link from the normal path.
Regards
Guan
> - expose subsystems in sysfs (Hannes Reinecke)
> - fix a subsystem leak when duplicate NQNs are found
> - fix up some names
> - don't clear current_path if freeing a different namespace
>
> Changes since V2:
> - don't create duplicate subsystems on reset (Keith Bush)
> - free requests properly when failing over in I/O completion (Keith Bush)
> - new devices names: /dev/nvm-sub%dn%d
> - expose the namespace identification sysfs files for the mpath nodes
>
> Changes since V1:
> - introduce new nvme_ns_ids structure to clean up identifier handling
> - generic_make_request_fast is now named direct_make_request and calls
> generic_make_request_checks
> - reset bi_disk on resubmission
> - create sysfs links between the existing nvme namespace block devices and
> the new share mpath device
> - temporarily added the timeout patches from James, this should go into
> nvme-4.14, though
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
> .
>
More information about the Linux-nvme
mailing list