[PATCH] nvmet-tcp: enable optional queue idle period tracking
Sagi Grimberg
sagi at grimberg.me
Mon Feb 15 16:18:17 EST 2021
On 2/11/21 3:42 PM, Wunderlich, Mark wrote:
> nvmet-tcp: enable optional queue idle period tracking
>
> Add 'queue idle period' option used by io_work() to support
> network devices enabled with advanced interrupt moderation
> supporting a relaxed interrupt model. It was discovered that
> such a NIC used on the target was unable to support initiator
> connection establishment, caused by the existing io_work()
> flow that immediately exits and does not re-queue itself at
> the first loop with no activity.
>
> With this new option a queue is assigned a period of time
> that no activity must occur in order to become 'idle'. Until
> the queue is idle the work item is requeued.
>
> The new module option is defined as changeable making it
> flexible for testing purposes.
>
> The pre-existing legacy behavior is preserved when no module option
> for queue idle period is specified.
>
> Signed-off-by: Mark Wunderlich <mark.wunderlich at intel.com>
> ---
> Testing was performed with a NIC using standard HW interrupt mode, with
> and without the new module option enabled. No measurable performance
> drop was seen when the patch was applied and the new option specified
> or not. A side effect of a standard NIC using the new option
> will reduce the context switch rate. We measured a drop from roughly
> 90K to less than 300 (for 32 active connections).
Nice!
>
> For a NIC using a passive advanced interrupt moderation policy, it was
> then successfully able to achieve and maintain active connections with
> the target.
> ---
> drivers/nvme/target/tcp.c | 65 ++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 64 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index dc1f0f647189..f5e6e2806841 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -29,6 +29,16 @@ static int so_priority;
> module_param(so_priority, int, 0644);
> MODULE_PARM_DESC(so_priority, "nvmet tcp socket optimize priority");
>
> +/*
> + * Define a time period (in msecs)
Is this in msecs or usecs? Anyway I think it should be in the name as
well.
that io_work() shall sample an activated
> + * queue before determining it to be idle. This optional module behavior
> + * can enable NIC solutions that support socket optimized packet processing
> + * using advanced interrupt moderation techniques.
> + */
> +static int queue_idle_period;
> +module_param(queue_idle_period, int, 0644);
> +MODULE_PARM_DESC(queue_idle_period, "nvmet tcp io_work queue idle time period");
I think this should be called "queue_poll_period" maybe? because this is
the period a queue will keep actively trying to read data from the
socket regardless if there is any activity or not right?
Or does this mean a different semantics with this?
> +
> #define NVMET_TCP_RECV_BUDGET 8
> #define NVMET_TCP_SEND_BUDGET 8
> #define NVMET_TCP_IO_WORK_BUDGET 64
> @@ -96,6 +106,7 @@ struct nvmet_tcp_queue {
> struct work_struct io_work;
> struct nvmet_cq nvme_cq;
> struct nvmet_sq nvme_sq;
> + unsigned long idle_period;
poll_period?
>
> /* send state */
> struct nvmet_tcp_cmd *cmds;
> @@ -1198,12 +1209,42 @@ static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue *queue)
> spin_unlock(&queue->state_lock);
> }
>
> +/*
> + * This worker function will process all send and recv packet
> + * activity for a queue. It will loop on the queue for up to a
> + * given maximum operation budget, or until there is no activity
> + * during a single loop iteration.
> + *
> + * Two exit modes are possible.
> + *
> + * The default 'pending' mode where the worker will re-queue
> + * itself, after exiting the work loop, only if any send or recv
> + * activity was recorded during the last pass within the loop.
> + *
> + * A optional 'idle period' mode where in addition to re-queueing
> + * itself because of activity it also tracks if a queue has not reached an
> + * assigned 'idle' period of time. The worker consumes from the assigned
> + * time period, across many potential invocations, until it is expired.
> + * A queue with activity always being awarded a fresh time
> + * period for processing.
> + */
> static void nvmet_tcp_io_work(struct work_struct *w)
> {
> struct nvmet_tcp_queue *queue =
> container_of(w, struct nvmet_tcp_queue, io_work);
> bool pending;
> int ret, ops = 0;
> + long period_remaining;
> + unsigned long idle_deadline;
> +
> + /* Setup use of optional tracking for idle time period */
Redundant comment.
> + if (queue_idle_period) {
> + /* Assign the queue an idle period if not already set */
> + if (!queue->idle_period)
> + queue->idle_period =
> + usecs_to_jiffies(queue_idle_period);
> + idle_deadline = jiffies + queue->idle_period;
Why do you need to keep a period time?
Why don't you just maintain queue->poll_start and just check if
your past the queue_poll time addition:
if (queue_idle_period && queue->start_poll == 0)
queue->start_poll = jiffies;
...
[io_work loop]
...
if (queue_poll_period) {
if (!time_after(jiffies, queue->start_poll +
usecs_to_jiffies(queue_poll_period)))
pending = true;
else
queue->start_poll = 0;
}
Can this work?
> + }
>
> do {
> pending = false;
> @@ -1222,8 +1263,30 @@ static void nvmet_tcp_io_work(struct work_struct *w)
>
> } while (pending && ops < NVMET_TCP_IO_WORK_BUDGET);
>
> + /* If optional deadline mode active, determine if queue has reached its
> + * idle process deadline limit. Remaining deadline is calculated. Any ops
> + * activity awards the queue a new deadline period.
> + */
Comments should stay consistent in style...
> + if (queue_idle_period) {
> + /*
> + * Clear to award active non-idle queue new period, or
> + * reset for future queue activity after exit when idle reached.
> + */
> + queue->idle_period = 0;
> + if (ops > 0) {
> + pending = true;
What is this useful for?
> + } else if (!time_after(jiffies, idle_deadline)) {
> + period_remaining = (long)(idle_deadline - jiffies);
> + if (period_remaining > 0) {
> + pending = true;
> + queue->idle_period = period_remaining;
> + }
> + }
> + }
See above if it is simpler?
> +
> /*
> - * We exahusted our budget, requeue our selves
> + * We requeue ourself when pending indicates there was activity
> + * recorded, or queue has not reached optional idle time period.
> */
> if (pending)
> queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
More information about the Linux-nvme
mailing list