SPDK initiators (Vmware 7.x) can not connect to nvmet-rdma.
Max Gurtovoy
mgurtovoy at nvidia.com
Thu Sep 2 14:36:16 PDT 2021
On 8/31/2021 4:42 PM, Mark Ruijter wrote:
> When I connect an SPDK initiator it will try to connect using 1024 connections.
> The linux target is unable to handle this situation and return an error.
>
> Aug 28 14:22:56 crashme kernel: [169366.627010] infiniband mlx5_0: create_qp:2789:(pid 33755): Create QP type 2 failed
> Aug 28 14:22:56 crashme kernel: [169366.627913] nvmet_rdma: failed to create_qp ret= -12
> Aug 28 14:22:56 crashme kernel: [169366.628498] nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
>
> It is really easy to reproduce the problem, even when not using the SPDK initiator.
>
> Just type:
> nvme connect --transport=rdma --queue-size=1024 --nqn=SOME.NQN --traddr=SOME.IP --trsvcid=XXXX
> While a linux initiator attempts to setup 64 connections, SPDK attempts to create 1024 connections.
1024 connections or is it the queue depth ?
how many cores you have in initiator ?
can you give more details on the systems ?
>
> The result is that anything which relies on SPDK, like VMware 7.x for example, won't be able to connect.
> Forcing the queues to be restricted to 256 QD solves some of it. In this case SPDK and VMware seem to connect.
> See the code section below. Sadly, VMware declares the path to be dead afterwards. I guess this 'fix' needs more work. ;-(
>
> In noticed that someone reported this problem on the SPDK list:
> https://github.com/spdk/spdk/issues/1719
>
> Thanks,
>
> Mark
>
> ---
> static int
> nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
> struct nvmet_rdma_queue *queue)
> {
> struct nvme_rdma_cm_req *req;
>
> req = (struct nvme_rdma_cm_req *)conn->private_data;
> if (!req || conn->private_data_len == 0)
> return NVME_RDMA_CM_INVALID_LEN;
>
> if (le16_to_cpu(req->recfmt) != NVME_RDMA_CM_FMT_1_0)
> return NVME_RDMA_CM_INVALID_RECFMT;
>
> queue->host_qid = le16_to_cpu(req->qid);
>
> /*
> * req->hsqsize corresponds to our recv queue size plus 1
> * req->hrqsize corresponds to our send queue size
> */
> queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
> queue->send_queue_size = le16_to_cpu(req->hrqsize);
> if (!queue->host_qid && queue->recv_queue_size > NVME_AQ_DEPTH) {
> pr_info("MARK nvmet_rdma_parse_cm_connect_req return %i", NVME_RDMA_CM_INVALID_HSQSIZE);
> return NVME_RDMA_CM_INVALID_HSQSIZE;
> }
>
> + if (queue->recv_queue_size > 256)
> + queue->recv_queue_size = 256;
> + if (queue->send_queue_size > 256)
> + queue->send_queue_size = 256;
> + pr_info("MARK queue->recv_queue_size = %i", queue->recv_queue_size);
> + pr_info("MARK queue->send_queue_size = %i", queue->send_queue_size);
>
> /* XXX: Should we enforce some kind of max for IO queues? */
> return 0;
> }
>
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
More information about the Linux-nvme
mailing list