[PATCH V3] nvmet-tcp: enable optional queue idle period tracking

Tue Mar 23 22:52:41 GMT 2021

Hey Mark,

> nvmet-tcp: enable optional queue idle period tracking
> 
> Add 'idle_poll_period_usecs' option used by io_work() to support
> network devices enabled with advanced interrupt moderation
> supporting a relaxed interrupt model. It was discovered that
> such a NIC used on the target was unable to support initiator
> connection establishment, caused by the existing io_work()
> flow that immediately exits after a loop with no activity and
> does not re-queue itself.
> 
> With this new option a queue is assigned a period of time
> that no activity must occur in order to become 'idle'.  Until
> the queue is idle the work item is requeued.
> 
> The new module option is defined as changeable making it
> flexible for testing purposes.
> 
> The pre-existing legacy behavior is preserved when no module option
> for idle_poll_period_usecs is specified.
> 
> Signed-off-by: Mark Wunderlich <mark.wunderlich at intel.com>
> ---

Its easier to read here with the format:
Changes from v2:
- ...
- ...

Changes from v1:
- ...
- ...

> V2 of this patch removes the accounting of time deducted from the
> idle deadline time period only during io_work activity.  The result
> is a more simple solution, only requiring the selection of a
> sufficient optional time period that will catch any non-idle activity
> to keep a queue active.
> 
> Testing was performed with a NIC using standard HW interrupt mode, with
> and without the new module option enabled.  No measurable performance
> drop was seen when the patch wsa applied and the new option specified
> or not.  A side effect of a standard NIC using the new option
> will reduce the context switch rate.  We measured a drop from roughly
> 90K to less than 300 (for 32 active connections).
> 
> For a NIC using a passive advanced interrupt moderation policy, it was
> then successfully able to achieve and maintain active connections with
> the target.
> ---
> V3 of this patch provides a bit more simplification, pulling the
> tracking code out of io_work and into two support functions.  Now a
> single test made after the process loop in io_work to determine if
> optional idle tracking is active or not.  The base logic of idle tracking
> used as condition to re-queue worker remains the same.
> ---
>   drivers/nvme/target/tcp.c |   44 +++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index dc1f0f647189..0e86115af9f4 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -29,6 +29,16 @@ static int so_priority;
>   module_param(so_priority, int, 0644);
>   MODULE_PARM_DESC(so_priority, "nvmet tcp socket optimize priority");
>   
> +/* Define a time period (in usecs) that io_work() shall sample an activated
> + * queue before determining it to be idle.  This optional module behavior
> + * can enable NIC solutions that support socket optimized packet processing
> + * using advanced interrupt moderation techniques.
> + */
> +static int idle_poll_period_usecs;
> +module_param(idle_poll_period_usecs, int, 0644);
> +MODULE_PARM_DESC(idle_poll_period_usecs,
> +		"nvmet tcp io_work poll till idle time period in usecs");
> +
>   #define NVMET_TCP_RECV_BUDGET		8
>   #define NVMET_TCP_SEND_BUDGET		8
>   #define NVMET_TCP_IO_WORK_BUDGET	64
> @@ -119,6 +129,9 @@ struct nvmet_tcp_queue {
>   	struct ahash_request	*snd_hash;
>   	struct ahash_request	*rcv_hash;
>   
> +	unsigned long		poll_start;
> +	unsigned long		poll_end;
> +
>   	spinlock_t		state_lock;
>   	enum nvmet_tcp_queue_state state;
>   
> @@ -1198,11 +1211,34 @@ static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue *queue)
>   	spin_unlock(&queue->state_lock);
>   }
>   
> +static inline void nvmet_tcp_arm_queue_deadline(struct nvmet_tcp_queue *queue)
> +{
> +	queue->poll_start = jiffies;
> +	queue->poll_end = queue->poll_start +
> +			usecs_to_jiffies(idle_poll_period_usecs);
> +}
> +
> +static bool nvmet_tcp_check_queue_deadline(struct nvmet_tcp_queue *queue,
> +		int ops)
> +{
> +	if (!idle_poll_period_usecs)
> +		return false;
> +
> +	if (ops || !queue->poll_start)
> +		nvmet_tcp_arm_queue_deadline(queue);
> +
> +	if (!time_in_range(jiffies, queue->poll_start, queue->poll_end)) {
> +		queue->poll_start = queue->poll_end = 0;
> +		return false;
> +	}
> +	return true;

Can this be simpler somehow? without clearing the poll limits and then
looking at poll_start to indicate that?

> +}
> +
>   static void nvmet_tcp_io_work(struct work_struct *w)
>   {
>   	struct nvmet_tcp_queue *queue =
>   		container_of(w, struct nvmet_tcp_queue, io_work);
> -	bool pending;
> +	bool pending, requeue;
>   	int ret, ops = 0;
>   
>   	do {
> @@ -1223,9 +1259,11 @@ static void nvmet_tcp_io_work(struct work_struct *w)
>   	} while (pending && ops < NVMET_TCP_IO_WORK_BUDGET);
>   
>   	/*
> -	 * We exahusted our budget, requeue our selves
> +	 * Requeue the worker if idle deadline period is in progress or any
> +	 * ops activity was recorded during the do-while loop above.
>   	 */
> -	if (pending)
> +	requeue = nvmet_tcp_check_queue_deadline(queue, ops) || pending;
> +	if (requeue)

I'm thinking that this requeue variable is redundant,

You can do instead:

	if (nvmet_tcp_check_queue_deadline(queue, ops) || pending)
>   		queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
>   }
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>