[PATCHv2] nvme-pci: allow unmanaged interrupts

Fri May 10 17:44:50 PDT 2024

On Fri, May 10, 2024 at 06:29:23PM -0600, Keith Busch wrote:
> On Sat, May 11, 2024 at 07:47:26AM +0800, Ming Lei wrote:
> > On Fri, May 10, 2024 at 10:46:45AM -0700, Keith Busch wrote:
> > >  		map->queue_offset = qoff;
> > > -		if (i != HCTX_TYPE_POLL && offset)
> > > +		if (managed_irqs && i != HCTX_TYPE_POLL && offset)
> > >  			blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
> > >  		else
> > >  			blk_mq_map_queues(map);
> > 
> > Now the queue mapping is built with nothing from irq affinity which is
> > setup from userspace, and performance could be pretty bad.
> 
> This just decouples the sw from the irq mappings. Every cpu still has a
> blk-mq hctx, there's just no connection to the completing CPU if you
> enable this.

I don't object to unmanaged irq, which is actually supported in some scsi
hosts too, but all or most of them still wire pci irq vector affinities with
hw queue, instead of using mapping from blk_mq_map_queues() simply. 

> 
> Everyone expects nvme performance will suffer. IO latency and CPU
> efficieny are not everyone's top priority, so allowing people to
> optimize for something else seems like a reasonable request.

I guess more people may be interested in 'something else', care to share
them in the commit log, cause nvme is going to support it.

>  
> > Is there any benefit to use unmanaged irq in this way?
> 
> The immediate desire is more predictable scheduling on a subset of CPUs
> by steering hardware interrupts somewhere else. It's the same reason
> RDMA undid managed interrupts.
> 
>   231243c82793428 ("Revert "mlx5: move affinity hints assignments to generic code")

The above commit only mentions it becomes not flexible since user can't
adjust irq affinity any more.

It is understandable for network, there is long history people need to adjust
irq affinity from user space.

Thanks,
Ming