Cannot Connect NVMeoF At Certain NR_IO_Queues Values

Max Gurtovoy maxg at mellanox.com
Mon May 14 15:46:49 PDT 2018


Hi Joseph,


On 5/14/2018 8:46 PM, Gruher, Joseph R wrote:
> I'm running Ubuntu 18.04 with the included 4.15.0 kernel and Mellanox CX4 NICs and Intel P4800X SSDs.  I'm using NVMe-CLI v1.5 and nvmetcli v0.6.
> 
> I am getting a connect failure even at a relatively moderate nr_io_queues value such as 8:
> 
> rsa at tppjoe01:~$ sudo nvme connect -t rdma -a 10.6.0.16 -i 8 -n NQN1
> Failed to write to /dev/nvme-fabrics: Invalid cross-device link
> 
> However, it works just fine if I use a smaller value, such as 4:
> 
> rsa at tppjoe01:~$ sudo nvme connect -t rdma -a 10.6.0.16 -i 4 -n NQN1
> rsa at tppjoe01:~$
> 
> Target side dmesg from a failed attached with -i 8:
> 
> [425470.899691] nvmet: creating controller 1 for subsystem NQN1 for NQN nqn.2014-08.org.nvmexpress:uuid:8d0ac789-9136-4275-a46c-8d1223c8fe84.
> [425471.081358] nvmet: adding queue 1 to ctrl 1.
> [425471.081563] nvmet: adding queue 2 to ctrl 1.
> [425471.081758] nvmet: adding queue 3 to ctrl 1.
> [425471.110059] nvmet_rdma: freeing queue 3
> [425471.110946] nvmet_rdma: freeing queue 1
> [425471.111905] nvmet_rdma: freeing queue 2
> [425471.382128] nvmet_rdma: freeing queue 4
> [425471.522836] nvmet_rdma: freeing queue 5
> [425471.640105] nvmet_rdma: freeing queue 7
> [425471.669427] nvmet_rdma: freeing queue 6
> [425471.670107] nvmet_rdma: freeing queue 0
> [425471.692922] nvmet_rdma: freeing queue 8
> 
> Initiator side dmesg from same attempt:
> 
> [862316.209664] nvme nvme1: creating 8 I/O queues.
> [862316.391411] nvme nvme1: Connect command failed, error wo/DNR bit: -16402
> [862316.406271] nvme nvme1: failed to connect queue: 4 ret=-18

IMO this issue was fixed in mlx5_core function mlx5_get_vector_affinity.
It was a long discussion regarding this fix and it will be fixed again 
in 4.17. After the final fix, it should go to stable kernel as well.
Meanwhile I can suggest a fast workaround for you if needed (or other 
solutions as well):

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 0f840ec..dd92cb9 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2236,7 +2236,7 @@ static int nvme_rdma_map_queues(struct 
blk_mq_tag_set *set)
         .init_hctx      = nvme_rdma_init_hctx,
         .poll           = nvme_rdma_poll,
         .timeout        = nvme_rdma_timeout,
-       .map_queues     = nvme_rdma_map_queues,
  };



-Max.



More information about the Linux-nvme mailing list