[PATCH v10 13/13] docs: add io_queue flag to isolcpus

Aaron Tomlin atomlin at atomlin.com
Fri Apr 10 12:31:22 PDT 2026


On Fri, Apr 10, 2026 at 10:44:15AM +0800, Ming Lei wrote:
> For unmanaged interrupts, user can set irq affinity on housekeeping cpus
> from /proc or kernel command line.
> 
> Why is unmanaged interrupts involved with this patchset?

Thank you for your continued engagement and for ultimately supporting the
progression of this series.

To clarify the handling of unmanaged interrupts, while it is entirely true
that an administrator could attempt to manually configure "irqaffinity=" or
via procfs after the fact, this series actively address unmanaged interrupts.

> > CPUs, thereby breaking isolation. By applying the constraint via io_queue
> > at the block layer, we restrict the hardware queue count and map the
> > isolated CPUs to the housekeeping queues, ensuring isolation is maintained
> > regardless of whether the driver uses managed interrupts.
> > 
> > Does the above help?
> 
> As I mentioned, managed irq already covers it:
> 
> - typically application submits IO from housekeeping CPUs, which is mapped
>   to one hardware, which effective interrupt affinity excludes isolated
>   CPUs if possible.
> 
> I'd suggest to share some real problems you found instead of something
> imaginary.

If we trace how mpi3mr sets up its ISRs, it relies heavily on the core
grouping logic:

mpi3mr_setup_isr
{
  unsigned int irq_flags = PCI_IRQ_MSIX

  struct irq_affinity desc = { .pre_vectors =  1, .post_vectors = 1, }

  pci_alloc_irq_vectors_affinity(mrioc->pdev, min_vec,
                                 max_vectors, irq_flags, &desc)
  {
    if (flags & PCI_IRQ_MSIX) {
      // affd != NULL
      __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs, affd, flags)
      {

        for (;;) {

          msix_capability_init(dev, entries, nvec, affd)
          {
            msix_setup_interrupts(dev, entries, nvec, affd)
            {
              // affd
              irq_create_affinity_masks(nvec, affd)
              {
                for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
                  unsigned int nr_masks, this_vecs = affd->set_size[i]
                  struct cpumask *result = group_cpus_evenly(this_vecs,
                                                             &nr_masks)
                  if (!result) {
                    kfree(masks)
                    return NULL
                  }

                  for (int j = 0; j < nr_masks; j++)
                    cpumask_copy(&masks[curvec + j].mask, &result[j])
                  kfree(result);

                  curvec += nr_masks
                  usedvecs += nr_masks
                }
              }
            }
          }
        }
      }
    }
  }
}

The critical issue lies at the invocation of group_cpus_evenly(). Without
this patchset, the core logic lacks the necessary constraints to respect
CPU isolation. It is entirely possible, and indeed happens in practice, for
an isolated CPU to be assigned to a CPU mask group.

The newer implementation of irq_create_affinity_masks() introduced by this
series resolves this. It considers the new CPU mask added to the IRQ
affinity descriptor. When group_mask_cpus_evenly() is called, this mask is
evaluated [1], guaranteeing that isolated CPUs are entirely excluded from
the mask groups.

[1]: https://lore.kernel.org/lkml/20260401222312.772334-8-atomlin@atomlin.com/

> > > > > IMO, only two differences from this viewpoint:
> > > > >
> > > > > 1) `io_queue` may reduce nr_hw_queues
> > > > >
> > > > > 2) when application submits IO from isolated CPUs, `io_queue` can complete
> > > > > IO from housekeeping CPUs.
> > > >
> > > > Acknowledged.
> > > 
> > > Are there other major differences besides the two mentioned above?
> > 
> > I believe the above is sufficient. Please let me know your thoughts.
> 
> Both two are small improvement, not bug fixes. However the user has to pay
> the cost of potential failing of offlining CPU. Not mention the little 
> complicated change: `19 files changed, 378 insertions(+), 48 deletions(-)`
> 
> But I won't object if you can update the commit log/kernel command line
> doc and fix the issue found in review.

Thank you again for your rigorous review, patience, and invaluable
guidance.


Kind regards,
-- 
Aaron Tomlin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20260410/b5adc8f2/attachment-0001.sig>


More information about the Linux-nvme mailing list