[PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
Leon Romanovsky
leonro at mellanox.com
Thu Nov 16 10:08:53 PST 2017
On Thu, Nov 16, 2017 at 07:21:22PM +0200, Max Gurtovoy wrote:
> Since there is an active discussion regarding the CQ pool architecture, I decided to push
> this feature (maybe it can be pushed before CQ pool).
Max,
Thanks for CCing me, can you please repost the series and CC linux-rdma too?
>
> This is a new feature for NVMEoF RDMA target, that is intended to save resource allocation
> (by sharing them) and utilize the locality of completions to get the best performance with
> Shared Receive Queues (SRQs). We'll create a SRQ per completion vector (and not per device)
> using a new API (SRQ pool, added to this patchset too) and associate each created QP/CQ with
> an appropriate SRQ. This will also reduce the lock contention on the single SRQ per device
> (today's solution).
>
> My testing environment included 4 initiators (CX5, CX5, CX4, CX3) that were connected to 4
> subsystems (1 ns per sub) throw 2 ports (each initiator connected to unique subsystem
> backed in a different bull_blk device) using a switch to the NVMEoF target (CX5).
> I used RoCE link layer.
>
> Configuration:
> - Irqbalancer stopped on each server
> - set_irq_affinity.sh on each interface
> - 2 initiators run traffic throw port 1
> - 2 initiators run traffic throw port 2
> - On initiator set register_always=N
> - Fio with 12 jobs, iodepth 128
>
> Memory consumption calculation for recv buffers (target):
> - Multiple SRQ: SRQ_size * comp_num * ib_devs_num * inline_buffer_size
> - Single SRQ: SRQ_size * 1 * ib_devs_num * inline_buffer_size
> - MQ: RQ_size * CPU_num * ctrl_num * inline_buffer_size
>
> Cases:
> 1. Multiple SRQ with 1024 entries:
> - Mem = 1024 * 24 * 2 * 4k = 192MiB (Constant number - not depend on initiators number)
> 2. Multiple SRQ with 256 entries:
> - Mem = 256 * 24 * 2 * 4k = 48MiB (Constant number - not depend on initiators number)
> 3. MQ:
> - Mem = 256 * 24 * 8 * 4k = 192MiB (Mem grows for every new created ctrl)
> 4. Single SRQ (current SRQ implementation):
> - Mem = 4096 * 1 * 2 * 4k = 32MiB (Constant number - not depend on initiators number)
>
> results:
>
> BS 1.read (target CPU) 2.read (target CPU) 3.read (target CPU) 4.read (target CPU)
> --- --------------------- --------------------- --------------------- ----------------------
> 1k 5.88M (80%) 5.45M (72%) 6.77M (91%) 2.2M (72%)
>
> 2k 3.56M (65%) 3.45M (59%) 3.72M (64%) 2.12M (59%)
>
> 4k 1.8M (33%) 1.87M (32%) 1.88M (32%) 1.59M (34%)
>
> BS 1.write (target CPU) 2.write (target CPU) 3.write (target CPU) 4.write (target CPU)
> --- --------------------- --------------------- --------------------- ----------------------
> 1k 5.42M (63%) 5.14M (55%) 7.75M (82%) 2.14M (74%)
>
> 2k 4.15M (56%) 4.14M (51%) 4.16M (52%) 2.08M (73%)
>
> 4k 2.17M (28%) 2.17M (27%) 2.16M (28%) 1.62M (24%)
>
>
> We can see the perf improvement between Case 2 and Case 4 (same order of resource).
> We can see the benefit in resource consumption (mem and CPU) with a small perf loss
> between cases 2 and 3.
> There is still an open question between the perf differance for 1k between Case 1 and
> Case 3, but I guess we can investigate and improve it incrementaly.
>
> Thanks to Idan Burstein and Oren Duer for suggesting this nice feature.
>
> Changes from V1:
> - Added SRQ pool per protection domain for IB/core
> - Fixed few comments from Christoph and Sagi
>
> Max Gurtovoy (3):
> IB/core: add a simple SRQ pool per PD
> nvmet-rdma: use srq pointer in rdma_cmd
> nvmet-rdma: use SRQ per completion vector
>
> drivers/infiniband/core/Makefile | 2 +-
> drivers/infiniband/core/srq_pool.c | 106 +++++++++++++++++++++
> drivers/infiniband/core/verbs.c | 4 +
> drivers/nvme/target/rdma.c | 190 +++++++++++++++++++++++++++----------
> include/rdma/ib_verbs.h | 5 +
> include/rdma/srq_pool.h | 46 +++++++++
> 6 files changed, 301 insertions(+), 52 deletions(-)
> create mode 100644 drivers/infiniband/core/srq_pool.c
> create mode 100644 include/rdma/srq_pool.h
>
> --
> 1.8.3.1
>
More information about the Linux-nvme
mailing list