[LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

Sagi Grimberg sagi at grimberg.me
Thu Jan 19 00:12:17 PST 2017


>>> I think you missed:
>>> http://git.infradead.org/nvme.git/commit/49c91e3e09dc3c9dd1718df85112a8cce3ab7007
>>
>> I indeed did, thanks.
>>
> But it doesn't help.
>
> We're still having to wait for the first interrupt, and if we're really
> fast that's the only completion we have to process.
>
> Try this:
>
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index b4b32e6..e2dd9e2 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -623,6 +623,8 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>         }
>         __nvme_submit_cmd(nvmeq, &cmnd);
>         spin_unlock(&nvmeq->sq_lock);
> +       disable_irq_nosync(nvmeq_irq(irq));
> +       irq_poll_sched(&nvmeq->iop);

a. This would trigger a condition that we disable irq twice which
is wrong at least because it will generate a warning.

b. This would cause a way-too-much triggers of ksoftirqd. In order for
it to be effective we need to to run only when it should and optimally
when the completion queue has a batch of completions waiting.

After a deeper analysis, I agree with Bart that interrupt coalescing is
needed for it to work. The problem with nvme coalescing as Jens said, is
a death penalty of 100us granularity. Hannes, Johannes, how does it look
like with the devices you are testing with?

Also, I think that adaptive moderation is needed in order for it to
work well. I know that some networking drivers implemented adaptive
moderation in SW before having HW support for it. It can be done by
maintaining stats and having a periodic work that looks at it and
changes the moderation parameters.

Does anyone think that this is something we should consider?



More information about the Linux-nvme mailing list