[PATCH] nvme-rdma: Distribute I/O queues across more completion vectors

Thu Jan 19 16:16:59 PST 2023

Currently, nvme I/O queues are assigned to rdma completion vectors by
their queue number, distributing them from vector 0 to (max I/O queue-1),
assuming a large enough number of available vectors. As a result, work is
distributed across a number of cpus equal to the number of I/O queues,
assuming the number of cpus is also large enough. As the number of devices
grows, the number of used cpus does not, eventually leading to avoidable
resource contention.

This patch implements a simple I/O queue distribution method that reduces
contention in the average use case by assigning I/O queues to all vectors
or cpus available.

Signed-off-by: Kyle Smith <kyles at hpe.com>
---
 drivers/nvme/host/rdma.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index bbad26b82b56..116616510a6b 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -463,14 +463,20 @@ static int nvme_rdma_get_max_fr_pages(struct ib_device *ibdev, bool pi_support)
 static int nvme_rdma_create_cq(struct ib_device *ibdev,
 		struct nvme_rdma_queue *queue)
 {
-	int ret, comp_vector, idx = nvme_rdma_queue_idx(queue);
+	int idx = nvme_rdma_queue_idx(queue);
 	enum ib_poll_context poll_ctx;
+	static atomic_t counter;
+	int comp_vector = 0;
+	int ret;
 
 	/*
-	 * Spread I/O queues completion vectors according their queue index.
+	 * Spread I/O queues completion vectors across all dev vectors / cpus.
 	 * Admin queues can always go on completion vector 0.
 	 */
-	comp_vector = (idx == 0 ? idx : idx - 1) % ibdev->num_comp_vectors;
+	if (idx != 0) {
+		comp_vector = atomic_inc_return(&counter) %
+			min_t(int, ibdev->num_comp_vectors, num_online_cpus());
+	}
 
 	/* Polling queues need direct cq polling context */
 	if (nvme_rdma_poll_queue(queue)) {