SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.

Thu Sep 2 14:36:16 PDT 2021

On 8/31/2021 4:42 PM, Mark Ruijter wrote:
> When I connect an SPDK initiator it will try to connect using 1024 connections.
> The linux target is unable to handle this situation and return an error.
>
> Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
> Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
> Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
>
> It is really easy to reproduce the problem, even when not using the SPDK initiator.
>
> Just type:
> nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX
> While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections.

1024 connections or is it the queue depth ?

how many cores you have in initiator ?

can you give more details on the systems ?

>
> The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect.
> Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect.
> See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-(
>
> In noticed that someone reported this problem on the SPDK list:
> https://github.com/spdk/spdk/issues/1719
>
> Thanks,
>
> Mark
>
> ---
> static int
> nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
>                                  struct nvmet_rdma_queue *queue)
> {
>          struct nvme_rdma_cm_req *req;
>
>          req = (struct nvme_rdma_cm_req *)conn->private_data;
>          if (!req || conn->private_data_len == 0)
>                  return NVME_RDMA_CM_INVALID_LEN;
>
>          if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0)
>                  return NVME_RDMA_CM_INVALID_RECFMT;
>
>          queue->host_qid = le16_to_cpu(req->qid);
>
>          /*
>           * req->hsqsize corresponds to our recv queue size plus 1
>           * req->hrqsize corresponds to our send queue size
>           */
>          queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
>          queue->send_queue_size = le16_to_cpu(req->hrqsize);
>          if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) {
>                  pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE);
>                  return NVME_RDMA_CM_INVALID_HSQSIZE;
>          }
>
> +        if (queue->recv_queue_size > 256)
> +               queue->recv_queue_size = 256;
> +        if (queue->send_queue_size > 256)
> +               queue->send_queue_size = 256;
> +       pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size);
> +       pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size);
>
>          /* XXX: Should we enforce some kind of max for IO queues? */
>          return 0;
> }
>
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme