[PATCH V4 1/3] driver core: mark device as irq affinity managed if any irq is managed

Bjorn Helgaas helgaas at kernel.org
Tue Jul 20 17:30:11 PDT 2021


On Sat, Jul 17, 2021 at 05:30:43PM +0800, Ming Lei wrote:
> On Fri, Jul 16, 2021 at 03:01:54PM -0500, Bjorn Helgaas wrote:
> > On Thu, Jul 15, 2021 at 08:08:42PM +0800, Ming Lei wrote:
> > > irq vector allocation with managed affinity may be used by driver, and
> > > blk-mq needs this info because managed irq will be shutdown when all
> > > CPUs in the affinity mask are offline.
> > > 
> > > The info of using managed irq is often produced by drivers(pci subsystem,
> > 
> > Add space between "drivers" and "(".
> > s/pci/PCI/
> 
> OK.
> 
> > Does this "managed IRQ" (or "managed affinity", not sure what the
> > correct terminology is here) have something to do with devm?
> > 
> > > platform device, ...), and it is consumed by blk-mq, so different subsystems
> > > are involved in this info flow
> > 
> > Add period at end of sentence.
> 
> OK.
> 
> > > Address this issue by adding one field of .irq_affinity_managed into
> > > 'struct device'.
> > > 
> > > Suggested-by: Christoph Hellwig <hch at lst.de>
> > > Signed-off-by: Ming Lei <ming.lei at redhat.com>
> > > ---
> > >  drivers/base/platform.c | 7 +++++++
> > >  drivers/pci/msi.c       | 3 +++
> > >  include/linux/device.h  | 1 +
> > >  3 files changed, 11 insertions(+)
> > > 
> > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> > > index 8640578f45e9..d28cb91d5cf9 100644
> > > --- a/drivers/base/platform.c
> > > +++ b/drivers/base/platform.c
> > > @@ -388,6 +388,13 @@ int devm_platform_get_irqs_affinity(struct platform_device *dev,
> > >  				ptr->irq[i], ret);
> > >  			goto err_free_desc;
> > >  		}
> > > +
> > > +		/*
> > > +		 * mark the device as irq affinity managed if any irq affinity
> > > +		 * descriptor is managed
> > > +		 */
> > > +		if (desc[i].is_managed)
> > > +			dev->dev.irq_affinity_managed = true;
> > >  	}
> > >  
> > >  	devres_add(&dev->dev, ptr);
> > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > > index 3d6db20d1b2b..7ddec90b711d 100644
> > > --- a/drivers/pci/msi.c
> > > +++ b/drivers/pci/msi.c
> > > @@ -1197,6 +1197,7 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> > >  	if (flags & PCI_IRQ_AFFINITY) {
> > >  		if (!affd)
> > >  			affd = &msi_default_affd;
> > > +		dev->dev.irq_affinity_managed = true;
> > 
> > This is really opaque to me.  I can't tell what the connection between
> > PCI_IRQ_AFFINITY and irq_affinity_managed is.
> 
> Comment for PCI_IRQ_AFFINITY is 'Auto-assign affinity',
> 'irq_affinity_managed' basically means that irq's affinity is managed by
> kernel.
> 
> What blk-mq needs is exactly if PCI_IRQ_AFFINITY is applied when
> allocating irq vectors. When PCI_IRQ_AFFINITY is used, genirq will
> shutdown the irq when all CPUs in the assigned affinity are offline,
> then blk-mq has to drain all in-flight IOs which will be completed
> via this irq and prevent new IO. That is the connection.
> 
> Or you think 'irq_affinity_managed' isn't named well?

I've been looking at "devm" management where there is a concept of
devm-managed IRQs, and you mentioned "managed IRQ" in the commit log.
But IIUC this is a completely different sort of management.

> > AFAICT the only place irq_affinity_managed is ultimately used is
> > blk_mq_hctx_notify_offline(), and there's no obvious connection
> > between that and this code.
> 
> I believe the connection is described in comment.

You mean the comment that says "hctx needn't to be deactivated in case
managed irq isn't used"?  Sorry, that really doesn't explain to me why
pci_alloc_irq_vectors_affinity() needs to set irq_affinity_managed.
There's just a lot of magic here that cannot be deduced by reading the
code.

Nit: s/needn't to be/needn't be/ in that comment.  Or maybe even
"Deactivate hctx only when managed IRQ is used"?  I still have no idea
what that means, but at least it's easier to read :)  Or maybe this is
actually "draining a queue" instead of "deactivating"?

Bjorn



More information about the Linux-nvme mailing list