[LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

Thu Jan 12 00:41:43 PST 2017

>> I'd like to attend LSF/MM and would like to discuss polling for block
>> drivers.
>>
>> Currently there is blk-iopoll but it is neither as widely used as NAPI in
>> the networking field and accoring to Sagi's findings in [1] performance
>> with polling is not on par with IRQ usage.
>>
>> On LSF/MM I'd like to whether it is desirable to have NAPI like polling in
>> more block drivers and how to overcome the currently seen performance
>> issues.
>>
>> [1] http://lists.infradead.org/pipermail/linux-nvme/2016-October/006975.ht
>> ml
>
> A typical Ethernet network adapter delays the generation of an interrupt
> after it has received a packet. A typical block device or HBA does not delay
> the generation of an interrupt that reports an I/O completion. I think that
> is why polling is more effective for network adapters than for block
> devices. I'm not sure whether it is possible to achieve benefits similar to
> NAPI for block devices without implementing interrupt coalescing in the
> block device firmware. Note: for block device implementations that use the
> RDMA API, the RDMA API supports interrupt coalescing (see also
> ib_modify_cq()).

Hey Bart,

I don't agree that interrupt coalescing is the reason why irq-poll is
not suitable for nvme or storage devices.

First, when the nvme device fires an interrupt, the driver consumes
the completion(s) from the interrupt (usually there will be some more
completions waiting in the cq by the time the host start processing it).
With irq-poll, we disable further interrupts and schedule soft-irq for
processing, which if at all, improve the completions per interrupt
utilization (because it takes slightly longer before processing the cq).

Moreover, irq-poll is budgeting the completion queue processing which is
important for a couple of reasons.

1. it prevents hard-irq context abuse like we do today. if other cpu
    cores are pounding with more submissions on the same queue, we might
    get into a hard-lockup (which I've seen happening).

2. irq-poll maintains fairness between devices by correctly budgeting
    the processing of different completions queues that share the same
    affinity. This can become crucial when working with multiple nvme
    devices, each has multiple io queues that share the same IRQ
    assignment.

3. It reduces (or at least should reduce) the overall number of
    interrupts in the system because we only enable interrupts again
    when the completion queue is completely processed.

So overall, I think it's very useful for nvme and other modern HBAs,
but unfortunately, other than solving (1), I wasn't able to see
performance improvement but rather a slight regression, but I can't
explain where its coming from...