[PATCH] nvmet-tcp: enable optional queue idle period tracking

Sagi Grimberg sagi at grimberg.me
Mon Feb 15 16:18:17 EST 2021



On 2/11/21 3:42 PM, Wunderlich, Mark wrote:
> nvmet-tcp: enable optional queue idle period tracking
> 
> Add 'queue idle period' option used by io_work() to support
> network devices enabled with advanced interrupt moderation
> supporting a relaxed interrupt model. It was discovered that
> such a NIC used on the target was unable to support initiator
> connection establishment, caused by the existing io_work()
> flow that immediately exits and does not re-queue itself at
> the first loop with no activity.
> 
> With this new option a queue is assigned a period of time
> that no activity must occur in order to become 'idle'.  Until
> the queue is idle the work item is requeued.
> 
> The new module option is defined as changeable making it
> flexible for testing purposes.
> 
> The pre-existing legacy behavior is preserved when no module option
> for queue idle period is specified.
> 
> Signed-off-by: Mark Wunderlich <mark.wunderlich at intel.com>
> ---
> Testing was performed with a NIC using standard HW interrupt mode, with
> and without the new module option enabled.  No measurable performance
> drop was seen when the patch was applied and the new option specified
> or not.  A side effect of a standard NIC using the new option
> will reduce the context switch rate.  We measured a drop from roughly
> 90K to less than 300 (for 32 active connections).

Nice!

> 
> For a NIC using a passive advanced interrupt moderation policy, it was
> then successfully able to achieve and maintain active connections with
> the target.
> ---
>   drivers/nvme/target/tcp.c |   65 ++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 64 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index dc1f0f647189..f5e6e2806841 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -29,6 +29,16 @@ static int so_priority;
>   module_param(so_priority, int, 0644);
>   MODULE_PARM_DESC(so_priority, "nvmet tcp socket optimize priority");
>   
> +/*
> + * Define a time period (in msecs)

Is this in msecs or usecs? Anyway I think it should be in the name as
well.

  that io_work() shall sample an activated
> + * queue before determining it to be idle.  This optional module behavior
> + * can enable NIC solutions that support socket optimized packet processing
> + * using advanced interrupt moderation techniques.
> + */
> +static int queue_idle_period;
> +module_param(queue_idle_period, int, 0644);
> +MODULE_PARM_DESC(queue_idle_period, "nvmet tcp io_work queue idle time period");

I think this should be called "queue_poll_period" maybe? because this is
the period a queue will keep actively trying to read data from the
socket regardless if there is any activity or not right?

Or does this mean a different semantics with this?

> +
>   #define NVMET_TCP_RECV_BUDGET		8
>   #define NVMET_TCP_SEND_BUDGET		8
>   #define NVMET_TCP_IO_WORK_BUDGET	64
> @@ -96,6 +106,7 @@ struct nvmet_tcp_queue {
>   	struct work_struct	io_work;
>   	struct nvmet_cq		nvme_cq;
>   	struct nvmet_sq		nvme_sq;
> +	unsigned long           idle_period;

poll_period?

>   
>   	/* send state */
>   	struct nvmet_tcp_cmd	*cmds;
> @@ -1198,12 +1209,42 @@ static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue *queue)
>   	spin_unlock(&queue->state_lock);
>   }
>   
> +/*
> + * This worker function will process all send and recv packet
> + * activity for a queue. It will loop on the queue for up to a
> + * given maximum operation budget, or until there is no activity
> + * during a single loop iteration.
> + *
> + * Two exit modes are possible.
> + *
> + * The default 'pending' mode where the worker will re-queue
> + * itself, after exiting the work loop, only if any send or recv
> + * activity was recorded during the last pass within the loop.
> + *
> + * A optional 'idle period' mode where in addition to re-queueing
> + * itself because of activity it also tracks if a queue has not reached an
> + * assigned 'idle' period of time. The worker consumes from the assigned
> + * time period, across many potential invocations, until it is expired.
> + * A queue with activity always being awarded a fresh time
> + * period for processing.
> + */
>   static void nvmet_tcp_io_work(struct work_struct *w)
>   {
>   	struct nvmet_tcp_queue *queue =
>   		container_of(w, struct nvmet_tcp_queue, io_work);
>   	bool pending;
>   	int ret, ops = 0;
> +	long period_remaining;
> +	unsigned long idle_deadline;
> +
> +	/* Setup use of optional tracking for idle time period */

Redundant comment.

> +	if (queue_idle_period) {
> +		/* Assign the queue an idle period if not already set */
> +		if (!queue->idle_period)
> +			queue->idle_period =
> +					usecs_to_jiffies(queue_idle_period);
> +		idle_deadline = jiffies + queue->idle_period;

Why do you need to keep a period time?

Why don't you just maintain queue->poll_start and just check if
your past the queue_poll time addition:

	if (queue_idle_period && queue->start_poll == 0)
		queue->start_poll = jiffies;

...
[io_work loop]
...

	if (queue_poll_period) {
		if (!time_after(jiffies, queue->start_poll +
				usecs_to_jiffies(queue_poll_period)))
			pending = true;
		else
			queue->start_poll = 0;
	}

Can this work?

> +	}
>   
>   	do {
>   		pending = false;
> @@ -1222,8 +1263,30 @@ static void nvmet_tcp_io_work(struct work_struct *w)
>   
>   	} while (pending && ops < NVMET_TCP_IO_WORK_BUDGET);
>   
> +	/* If optional deadline mode active, determine if queue has reached its
> +	 * idle process deadline limit.  Remaining deadline is calculated. Any ops
> +	 * activity awards the queue a new deadline period.
> +	 */

Comments should stay consistent in style...

> +	if (queue_idle_period) {
> +		/*
> +		 * Clear to award active non-idle queue new period, or
> +		 * reset for future queue activity after exit when idle reached.
> +		 */
> +		queue->idle_period = 0;
> +		if (ops > 0) {
> +			pending = true;

What is this useful for?

> +		} else if (!time_after(jiffies, idle_deadline)) {
> +			period_remaining = (long)(idle_deadline - jiffies);
> +			if (period_remaining > 0) {
> +				pending = true;
> +				queue->idle_period = period_remaining;
> +			}
> +		}
> +	}

See above if it is simpler?

> +
>   	/*
> -	 * We exahusted our budget, requeue our selves
> +	 * We requeue ourself when pending indicates there was activity
> +	 * recorded, or queue has not reached optional idle time period.
>   	 */
>   	if (pending)
>   		queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);



More information about the Linux-nvme mailing list