can not load more than 51 nvme rdma device

李春 pickup112 at gmail.com
Thu May 17 00:15:32 PDT 2018


Thanks for you help.


> Any reason for this type of 1:1 configuration ?
> Can you expose all 100 disks using 1 subsystem or 10 disks per subsystem ?
> Do you understand the difference in resource allocation between both
cases ?

Our actual production environment will use IB switches for interconnect in
a cluster. And in the actual production environment, we may also have
hundreds of disks output to the same node.
After we found this problem with io_queue, we recreated the problem in a
test environment with two nodes directly connected.
We know the difference you mentioned, but in the actual production
environment, we do have more than 51 targets loaded. So want to understand
thoroughly why this restriction occurs and whether there is a way to get
around this limitation


> try to use --queue-size=16 in your conect command.
> You don't realy need so many resources (10 io queues with 128 queue-size
> each) to saturate 56Gb wire.

How can we turn the size of nr-io-queues or queue-size. Is there a best
practice for our scenario?
Increase or decrease the queue-size/nr-io-queues will impact what?
so if nr-io-queues * queue-size * nvme_connect_number> max_mr, we will meet
the error.

> This is because you try to allocate more MRs that the maximum support of
> the device.
> In NVNe/RDMA We create "queue-size" number of MRs per each created IO
queue.

How can we know the maximum MRs  support by the device?
Do you mean that MR refers to Memory Region?

> The max_mr for this adapter is much bigger.
> If the above solutions are not enough, then we can dig-in more to low
> level drivers...

According to what you said above, max_mr is just the hardware attribute,
not related to Linux, nvme, rdma, is't it?

Whether max_mr is the attribute of a network card or a port on a network
card? according to my test on this side, after a port of a network card
reports an error on the dual port network card, another port can continue
to load a new target.

Can you suggest any information or document  which introduce relationship
between queue-size, nr_io_queues and MRs.







-- 
pickup.lichun 李春



More information about the Linux-nvme mailing list