[PATCH V3] nvmet-tcp: enable optional queue idle period tracking
Sagi Grimberg
sagi at grimberg.me
Tue Mar 23 22:52:41 GMT 2021
Hey Mark,
> nvmet-tcp: enable optional queue idle period tracking
>
> Add 'idle_poll_period_usecs' option used by io_work() to support
> network devices enabled with advanced interrupt moderation
> supporting a relaxed interrupt model. It was discovered that
> such a NIC used on the target was unable to support initiator
> connection establishment, caused by the existing io_work()
> flow that immediately exits after a loop with no activity and
> does not re-queue itself.
>
> With this new option a queue is assigned a period of time
> that no activity must occur in order to become 'idle'. Until
> the queue is idle the work item is requeued.
>
> The new module option is defined as changeable making it
> flexible for testing purposes.
>
> The pre-existing legacy behavior is preserved when no module option
> for idle_poll_period_usecs is specified.
>
> Signed-off-by: Mark Wunderlich <mark.wunderlich at intel.com>
> ---
Its easier to read here with the format:
Changes from v2:
- ...
- ...
Changes from v1:
- ...
- ...
> V2 of this patch removes the accounting of time deducted from the
> idle deadline time period only during io_work activity. The result
> is a more simple solution, only requiring the selection of a
> sufficient optional time period that will catch any non-idle activity
> to keep a queue active.
>
> Testing was performed with a NIC using standard HW interrupt mode, with
> and without the new module option enabled. No measurable performance
> drop was seen when the patch wsa applied and the new option specified
> or not. A side effect of a standard NIC using the new option
> will reduce the context switch rate. We measured a drop from roughly
> 90K to less than 300 (for 32 active connections).
>
> For a NIC using a passive advanced interrupt moderation policy, it was
> then successfully able to achieve and maintain active connections with
> the target.
> ---
> V3 of this patch provides a bit more simplification, pulling the
> tracking code out of io_work and into two support functions. Now a
> single test made after the process loop in io_work to determine if
> optional idle tracking is active or not. The base logic of idle tracking
> used as condition to re-queue worker remains the same.
> ---
> drivers/nvme/target/tcp.c | 44 +++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 41 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index dc1f0f647189..0e86115af9f4 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -29,6 +29,16 @@ static int so_priority;
> module_param(so_priority, int, 0644);
> MODULE_PARM_DESC(so_priority, "nvmet tcp socket optimize priority");
>
> +/* Define a time period (in usecs) that io_work() shall sample an activated
> + * queue before determining it to be idle. This optional module behavior
> + * can enable NIC solutions that support socket optimized packet processing
> + * using advanced interrupt moderation techniques.
> + */
> +static int idle_poll_period_usecs;
> +module_param(idle_poll_period_usecs, int, 0644);
> +MODULE_PARM_DESC(idle_poll_period_usecs,
> + "nvmet tcp io_work poll till idle time period in usecs");
> +
> #define NVMET_TCP_RECV_BUDGET 8
> #define NVMET_TCP_SEND_BUDGET 8
> #define NVMET_TCP_IO_WORK_BUDGET 64
> @@ -119,6 +129,9 @@ struct nvmet_tcp_queue {
> struct ahash_request *snd_hash;
> struct ahash_request *rcv_hash;
>
> + unsigned long poll_start;
> + unsigned long poll_end;
> +
> spinlock_t state_lock;
> enum nvmet_tcp_queue_state state;
>
> @@ -1198,11 +1211,34 @@ static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue *queue)
> spin_unlock(&queue->state_lock);
> }
>
> +static inline void nvmet_tcp_arm_queue_deadline(struct nvmet_tcp_queue *queue)
> +{
> + queue->poll_start = jiffies;
> + queue->poll_end = queue->poll_start +
> + usecs_to_jiffies(idle_poll_period_usecs);
> +}
> +
> +static bool nvmet_tcp_check_queue_deadline(struct nvmet_tcp_queue *queue,
> + int ops)
> +{
> + if (!idle_poll_period_usecs)
> + return false;
> +
> + if (ops || !queue->poll_start)
> + nvmet_tcp_arm_queue_deadline(queue);
> +
> + if (!time_in_range(jiffies, queue->poll_start, queue->poll_end)) {
> + queue->poll_start = queue->poll_end = 0;
> + return false;
> + }
> + return true;
Can this be simpler somehow? without clearing the poll limits and then
looking at poll_start to indicate that?
> +}
> +
> static void nvmet_tcp_io_work(struct work_struct *w)
> {
> struct nvmet_tcp_queue *queue =
> container_of(w, struct nvmet_tcp_queue, io_work);
> - bool pending;
> + bool pending, requeue;
> int ret, ops = 0;
>
> do {
> @@ -1223,9 +1259,11 @@ static void nvmet_tcp_io_work(struct work_struct *w)
> } while (pending && ops < NVMET_TCP_IO_WORK_BUDGET);
>
> /*
> - * We exahusted our budget, requeue our selves
> + * Requeue the worker if idle deadline period is in progress or any
> + * ops activity was recorded during the do-while loop above.
> */
> - if (pending)
> + requeue = nvmet_tcp_check_queue_deadline(queue, ops) || pending;
> + if (requeue)
I'm thinking that this requeue variable is redundant,
You can do instead:
if (nvmet_tcp_check_queue_deadline(queue, ops) || pending)
> queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
> }
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
More information about the Linux-nvme
mailing list