RFC: Allow block drivers to poll for I/O instead of sleeping
Ingo Molnar
mingo at kernel.org
Mon Jun 24 04:21:47 EDT 2013
* David Ahern <dsahern at gmail.com> wrote:
> On 6/23/13 3:09 AM, Ingo Molnar wrote:
> >If an IO driver is implemented properly then it will batch up requests for
> >the controller, and gets IRQ-notified on a (sub-)batch of buffers
> >completed.
> >
> >If there's any spinning done then it should be NAPI-alike polling: a
> >single "is stuff completed" polling pass per new block of work submitted,
> >to opportunistically interleave completion with submission work.
> >
> >I don't see where active spinning brings would improve performance
> >compared to a NAPI-alike technique. Your numbers obviously show a speedup
> >we'd like to have, I'm just wondering whether the same speedup (or even
> >more) could be implemented via:
> >
> > - smart batching that rate-limits completion IRQs in essence
> > + NAPI-alike polling
> >
> >... which would almost never result in IRQ driven completion when we are
> >close to CPU-bound and while not yet saturating the IO controller's
> >capacity.
> >
> >The spinning approach you add has the disadvantage of actively wasting CPU
> >time, which could be used to run other tasks. In general it's much better
> >to make sure the completion IRQs are rate-limited and just schedule. This
> >(combined with a metric ton of fine details) is what the networking code
> >does in essence, and they have no trouble reaching very high throughput.
>
> Networking code has a similar proposal for low latency sockets using
> polling: https://lwn.net/Articles/540281/
In that case it might make sense to try the generic approach I suggested
in the previous mail, which would measure average sleep latencies of
tasks, and would do light idle-polling instead of the more expensive
switch-to-the-idle-task context switch plus associated RCU, nohz, etc.
busy-CPU-tear-down and the symmetric build-up work on idle wakeup.
The IO driver would still have to take an IRQ though, preferably on the
CPU that runs the IO task ...
Thanks,
Ingo
More information about the Linux-nvme
mailing list