NVMeoF multi-path setup
Mike Snitzer
snitzer at redhat.com
Thu Jun 30 15:52:07 PDT 2016
On Thu, Jun 30 2016 at 5:57pm -0400,
Ming Lin <mlin at kernel.org> wrote:
> On Thu, 2016-06-30 at 14:08 -0700, Ming Lin wrote:
> > Hi Mike,
> >
> > I'm trying to test NVMeoF multi-path.
> >
> > root at host:~# lsmod |grep dm_multipath
> > dm_multipath 24576 0
> > root at host:~# ps aux |grep multipath
> > root 13183 0.0 0.1 238452 4972 ? SLl 13:41 0:00
> > /sbin/multipathd
> >
> > I have nvme0 and nvme1 that are 2 paths to the same NVMe subsystem.
> >
> > root at host:/sys/class/nvme# grep . nvme*/address
> > nvme0/address:traddr=192.168.3.2,trsvcid=1023
> > nvme1/address:traddr=192.168.2.2,trsvcid=1023
> >
> > root at host:/sys/class/nvme# grep . nvme*/subsysnqn
> > nvme0/subsysnqn:nqn.testiqn
> > nvme1/subsysnqn:nqn.testiqn
> >
> > root at host:~# /lib/udev/scsi_id --export --whitelisted -d /dev/nvme1n1
> > ID_SCSI=1
> > ID_VENDOR=NVMe
> > ID_VENDOR_ENC=NVMe\x20\x20\x20\x20
> > ID_MODEL=Linux
> > ID_MODEL_ENC=Linux
> > ID_REVISION=0-rc
> > ID_TYPE=disk
> > ID_SERIAL=SNVMe_Linux
> > ID_SERIAL_SHORT=
> > ID_SCSI_SERIAL=1122334455667788
> >
> > root at host:~# /lib/udev/scsi_id --export --whitelisted -d /dev/nvme0n1
> > ID_SCSI=1
> > ID_VENDOR=NVMe
> > ID_VENDOR_ENC=NVMe\x20\x20\x20\x20
> > ID_MODEL=Linux
> > ID_MODEL_ENC=Linux
> > ID_REVISION=0-rc
> > ID_TYPE=disk
> > ID_SERIAL=SNVMe_Linux
> > ID_SERIAL_SHORT=
> > ID_SCSI_SERIAL=1122334455667788
> >
> > But seems multipathd didn't recognize these 2 devices.
> >
> > What else I'm missing?
>
> There are two problems:
>
> 1. there is no "/block/" in the path
>
> /sys/devices/virtual/nvme-fabrics/block/nvme0/nvme0n1
You clarified that it is:
/sys/devices/virtual/nvme-fabrics/ctl/nvme0/nvme0n1
Do you have CONFIG_BLK_DEV_NVME_SCSI enabled?
AFAIK, hch had Intel disable that by default in the hopes of avoiding
people having dm-multipath "just work" with NVMeoF. (Makes me wonder
what other unpleasant unilateral decisions were made because some
non-existant NVMe specific multipath capabilities would be forthcoming
but I digress).
My understanding is that enabling CONFIG_BLK_DEV_NVME_SCSI will cause
NVMe to respond favorably to standard SCSI VPD inquiries.
And _yes_, Red Hat will be enabling it so users have options!
Also, just so you're aware, I've staged bio-based dm-multipath support
for the 4.8 merge window. Please see either the 'for-next' or 'dm-4.8'
branch in linux-dm.git:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=for-next
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8
I'd welcome you testing if bio-based dm-multipath performs better for
you than blk-mq request-based dm-multipath. Both modes (using the 4.8
staged code) can be easily selected on a per DM multipath device table
by adding either: queue_mode=bio or queue_mode=mq
(made possible with this commit:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=e83068a5faafb8ca65d3b58bd1e1e3959ce1ddce
)
> 2. nvme was blacklisted.
>
> I added below quick hack to just make it work.
>
> root at host:~# cat /proc/partitions
>
> 259 0 937692504 nvme0n1
> 252 0 937692504 dm-0
> 259 1 937692504 nvme1n1
>
> diff --git a/libmultipath/blacklist.c b/libmultipath/blacklist.c
> index 2400eda..a143383 100644
> --- a/libmultipath/blacklist.c
> +++ b/libmultipath/blacklist.c
> @@ -190,9 +190,11 @@ setup_default_blist (struct config * conf)
> if (store_ble(conf->blist_devnode, str, ORIGIN_DEFAULT))
> return 1;
>
> +#if 0
> str = STRDUP("^nvme.*");
> if (!str)
> return 1;
> +#endif
> if (store_ble(conf->blist_devnode, str, ORIGIN_DEFAULT))
> return 1;
That's weird, not sure why that'd be the case.. maybe because NVMeoF
hasn't been worked through to "just work" with multipath-tools
yet.. Ben? Hannes?
> diff --git a/multipathd/main.c b/multipathd/main.c
> index c0ca571..1364070 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1012,6 +1012,7 @@ uxsock_trigger (char * str, char ** reply, int * len, void * trigger_data)
> static int
> uev_discard(char * devpath)
> {
> +#if 0
> char *tmp;
> char a[11], b[11];
>
> @@ -1028,6 +1029,7 @@ uev_discard(char * devpath)
> condlog(4, "discard event on %s", devpath);
> return 1;
> }
> +#endif
> return 0;
> }
Why did you have to comment out this discard code?
More information about the Linux-nvme
mailing list