[PATCH v2 0/3] nvmet-rdma: SRQ per completion vector

Leon Romanovsky leonro at mellanox.com
Thu Nov 16 10:08:53 PST 2017


On Thu, Nov 16, 2017 at 07:21:22PM +0200, Max Gurtovoy wrote:
> Since there is an active discussion regarding the CQ pool architecture, I decided to push
> this feature (maybe it can be pushed before CQ pool).

Max,

Thanks for CCing me, can you please repost the series and CC linux-rdma too?

>
> This is a new feature for NVMEoF RDMA target, that is intended to save resource allocation
> (by sharing them) and utilize the locality of completions to get the best performance with
> Shared Receive Queues (SRQs). We'll create a SRQ per completion vector (and not per device)
> using a new API (SRQ pool, added to this patchset too) and associate each created QP/CQ with
> an appropriate SRQ. This will also reduce the lock contention on the single SRQ per device
> (today's solution).
>
> My testing environment included 4 initiators (CX5, CX5, CX4, CX3) that were connected to 4
> subsystems (1 ns per sub) throw 2 ports (each initiator connected to unique subsystem
> backed in a different bull_blk device) using a switch to the NVMEoF target (CX5).
> I used RoCE link layer.
>
> Configuration:
>  - Irqbalancer stopped on each server
>  - set_irq_affinity.sh on each interface
>  - 2 initiators run traffic throw port 1
>  - 2 initiators run traffic throw port 2
>  - On initiator set register_always=N
>  - Fio with 12 jobs, iodepth 128
>
> Memory consumption calculation for recv buffers (target):
>  - Multiple SRQ: SRQ_size * comp_num * ib_devs_num * inline_buffer_size
>  - Single SRQ: SRQ_size * 1 * ib_devs_num * inline_buffer_size
>  - MQ: RQ_size * CPU_num * ctrl_num * inline_buffer_size
>
> Cases:
>  1. Multiple SRQ with 1024 entries:
>     - Mem = 1024 * 24 * 2 * 4k = 192MiB (Constant number - not depend on initiators number)
>  2. Multiple SRQ with 256 entries:
>     - Mem = 256 * 24 * 2 * 4k = 48MiB (Constant number - not depend on initiators number)
>  3. MQ:
>     - Mem = 256 * 24 * 8 * 4k = 192MiB (Mem grows for every new created ctrl)
>  4. Single SRQ (current SRQ implementation):
>     - Mem = 4096 * 1 * 2 * 4k = 32MiB (Constant number - not depend on initiators number)
>
> results:
>
> BS    1.read (target CPU)   2.read (target CPU)    3.read (target CPU)   4.read (target CPU)
> ---  --------------------- --------------------- --------------------- ----------------------
> 1k     5.88M (80%)            5.45M (72%)            6.77M (91%)          2.2M (72%)
>
> 2k     3.56M (65%)            3.45M (59%)            3.72M (64%)          2.12M (59%)
>
> 4k     1.8M (33%)             1.87M (32%)            1.88M (32%)          1.59M (34%)
>
> BS    1.write (target CPU)   2.write (target CPU) 3.write (target CPU)   4.write (target CPU)
> ---  --------------------- --------------------- --------------------- ----------------------
> 1k     5.42M (63%)            5.14M (55%)            7.75M (82%)          2.14M (74%)
>
> 2k     4.15M (56%)            4.14M (51%)            4.16M (52%)          2.08M (73%)
>
> 4k     2.17M (28%)            2.17M (27%)            2.16M (28%)          1.62M (24%)
>
>
> We can see the perf improvement between Case 2 and Case 4 (same order of resource).
> We can see the benefit in resource consumption (mem and CPU) with a small perf loss
> between cases 2 and 3.
> There is still an open question between the perf differance for 1k between Case 1 and
> Case 3, but I guess we can investigate and improve it incrementaly.
>
> Thanks to Idan Burstein and Oren Duer for suggesting this nice feature.
>
> Changes from V1:
>  - Added SRQ pool per protection domain for IB/core
>  - Fixed few comments from Christoph and Sagi
>
> Max Gurtovoy (3):
>   IB/core: add a simple SRQ pool per PD
>   nvmet-rdma: use srq pointer in rdma_cmd
>   nvmet-rdma: use SRQ per completion vector
>
>  drivers/infiniband/core/Makefile   |   2 +-
>  drivers/infiniband/core/srq_pool.c | 106 +++++++++++++++++++++
>  drivers/infiniband/core/verbs.c    |   4 +
>  drivers/nvme/target/rdma.c         | 190 +++++++++++++++++++++++++++----------
>  include/rdma/ib_verbs.h            |   5 +
>  include/rdma/srq_pool.h            |  46 +++++++++
>  6 files changed, 301 insertions(+), 52 deletions(-)
>  create mode 100644 drivers/infiniband/core/srq_pool.c
>  create mode 100644 include/rdma/srq_pool.h
>
> --
> 1.8.3.1
>



More information about the Linux-nvme mailing list