[PATCH v10 13/13] docs: add io_queue flag to isolcpus

Aaron Tomlin atomlin at atomlin.com
Wed Apr 8 08:58:27 PDT 2026


On Mon, Apr 06, 2026 at 11:29:38AM +0800, Ming Lei wrote:
> I don't think there is such breaking isolation thing. For iopoll, if
> applications won't submit polled IO on isolated CPUs, everything is just
> fine. If they do it, IO may be reaped from isolated CPUs, that is just their
> choice, anything is wrong?

Hi Ming,

Thank you for your follow up. You make a fair point regarding polling
queues and application choice; if an application explicitly binds to an
isolated CPU and submits polled operations, it is indeed actively electing
to utilise that core and accept the resulting behaviour.

However, the architectural challenge arises from how the kernel handles
these queues structurally when the application does not explicitly make
that choice. Because poll queues never utilise interrupts, they are
completely invisible to the managed interrupt subsystem.

If we were to rely exclusively on the managed irq flag, the block layer
would blindly map these non interrupt driven polling queues to isolated
CPUs. If a general background storage operation were then routed to
that queue, the isolated core would be forced to spin actively in a tight
loop waiting for the hardware completion. This would completely monopolise
the core and destroy any real time isolation guarantees without the user
space application ever having requested it.

This illustrates precisely why the io queue flag is a mechanical necessity.
Its primary objective is to act as a comprehensive block layer isolation
boundary. It structurally restricts both hardware queue placement and
managed interrupt affinity strictly to housekeeping CPUs, ensuring that no
storage queue operations of any kind are mapped to an isolated CPU.

To achieve this reliably, this series expands the struct irq affinity
structure to incorporate a new CPU mask [1]. This mask is explicitly set to
the result of blk mq online queue affinity. By passing this housekeeping
mask directly through the interrupt affinity parameters, we ensure that the
native affinity calculation is strictly bounded to non isolated CPUs from
the moment the device probes.

This structural enhancement allows device drivers to seamlessly inherit the
isolation constraints without requiring bespoke, driver specific logic. A
clear example of this application can be seen in the modifications to the
Broadcom MPI3 Storage Controller [2]. By leveraging the expanded struct irq
affinity, the driver guarantees that its queues and corresponding managed
interrupts are perfectly aligned with the system housekeeping
configuration, completely avoiding the isolated CPUs during allocation.

[1]: https://lore.kernel.org/lkml/20260401222312.772334-5-atomlin@atomlin.com/
[2]: https://lore.kernel.org/lkml/20260401222312.772334-8-atomlin@atomlin.com/

I hope this better illustrates the mechanical necessity of the io_queue
flag and the corresponding changes to the interrupt affinity structures.

> > Every logical CPU, including the isolated ones, must logically map to a
> > hardware context in order to submit input and output requests, saying they
> > are completely restricted is indeed stale and technically inaccurate. The
> > isolation mechanism actually ensures that the hardware contexts themselves
> > are serviced by the housekeeping CPUs, while the isolated CPUs are simply
> > mapped onto these housekeeping queues for submission purposes. I will
> > rewrite this paragraph to accurately reflect this topology, ensuring it
> > aligns perfectly with the behaviour introduced in patch 10.
> 
> I am not sure if the above words is helpful from administrator viewpoint about
> the two kernel parameters.
> 
> IMO, only two differences from this viewpoint:
> 
> 1) `io_queue` may reduce nr_hw_queues
> 
> 2) when application submits IO from isolated CPUs, `io_queue` can complete
> IO from housekeeping CPUs.

Acknowledged.


Kind regards,
-- 
Aaron Tomlin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20260408/d3c0ea12/attachment-0001.sig>


More information about the Linux-nvme mailing list