[PATCH 2/2] nvme-rdma: Add remote_invalidation module parameter
Chuck Lever
chuck.lever at oracle.com
Sun Oct 29 11:24:23 PDT 2017
> On Oct 29, 2017, at 12:38 PM, idanb at mellanox.com wrote:
>
> From: Idan Burstein <idanb at mellanox.com>
>
> NVMe over Fabrics in its secure "register_always" mode
> registers and invalidates the user buffer upon each IO.
> The protocol enables the host to request the susbsystem
> to use SEND WITH INVALIDATE operation while returning the
> response capsule and invalidate the local key
> (remote_invalidation).
> In some HW implementations, the local network adapter may
> perform better while using local invalidation operations.
>
> The results below show that running with local invalidation
> rather then remote invalidation improve the iops one could
> achieve by using the ConnectX-5Ex network adapter by x1.36 factor.
> Nevertheless, using local invalidation induce more CPU overhead
> than enabling the target to invalidate remotly, therefore,
> because there is a CPU% vs IOPs limit tradeoff we propose to
> have module parameter to define whether to request remote
> invalidation or not.
>
> The following results were taken against a single nvme over fabrics
> subsystem with a single namespace backed by null_blk:
>
> Block Size s/g reg_wr inline reg_wr inline reg_wr + local inv
> ++++++++++++ ++++++++++++++ ++++++++++++++++ +++++++++++++++++++++++++++
> 512B 1446.6K/8.57% 5224.2K/76.21% 7143.3K/79.72%
> 1KB 1390.6K/8.5% 4656.7K/71.69% 5860.6K/55.45%
> 2KB 1343.8K/8.6% 3410.3K/38.96% 4106.7K/55.82%
> 4KB 1254.8K/8.39% 2033.6K/15.86% 2165.3K/17.48%
> 8KB 1079.5K/7.7% 1143.1K/7.27% 1158.2K/7.33%
> 16KB 603.4K/3.64% 593.8K/3.4% 588.9K/3.77%
> 32KB 294.8K/2.04% 293.7K/1.98% 294.4K/2.93%
> 64KB 138.2K/1.32% 141.6K/1.26% 135.6K/1.34%
Units reported here are KIOPS and %CPU ? What was the benchmark?
Was any root cause analysis done to understand why the IOPS
rate changes without RI? Reduction in avg RTT? Fewer long-
running outliers? Lock contention in the ULP?
I am curious enough to add a similar setting to NFS/RDMA,
now that I have mlx5 devices.
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> Signed-off-by: Idan Burstein <idanb at mellanox.com>
> ---
> drivers/nvme/host/rdma.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 92a03ff..7f8225d 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -146,6 +146,11 @@ static inline struct nvme_rdma_ctrl *to_rdma_ctrl(struct nvme_ctrl *ctrl)
> MODULE_PARM_DESC(register_always,
> "Use memory registration even for contiguous memory regions");
>
> +static bool remote_invalidation = true;
> +module_param(remote_invalidation, bool, 0444);
> +MODULE_PARM_DESC(remote_invalidation,
> + "request remote invalidation from subsystem (default: true)");
The use of a module parameter would be awkward in systems
that have a heterogenous collection of HCAs.
> +
> static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
> struct rdma_cm_event *event);
> static void nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc);
> @@ -1152,8 +1157,9 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
> sg->addr = cpu_to_le64(req->mr->iova);
> put_unaligned_le24(req->mr->length, sg->length);
> put_unaligned_le32(req->mr->rkey, sg->key);
> - sg->type = (NVME_KEY_SGL_FMT_DATA_DESC << 4) |
> - NVME_SGL_FMT_INVALIDATE;
> + sg->type = NVME_KEY_SGL_FMT_DATA_DESC << 4;
> + if (remote_invalidation)
> + sg->type |= NVME_SGL_FMT_INVALIDATE;
>
> return 0;
> }
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Chuck Lever
More information about the Linux-nvme
mailing list