[PATCH RESEND] lib/group_cpus: make group CPU cluster aware
Ming Lei
ming.lei at redhat.com
Mon Nov 10 19:25:37 PST 2025
On Tue, Nov 11, 2025 at 10:06:08AM +0800, Wangyang Guo wrote:
> As CPU core counts increase, the number of NVMe IRQs may be smaller than
> the total number of CPUs. This forces multiple CPUs to share the same
> IRQ. If the IRQ affinity and the CPU’s cluster do not align, a
> performance penalty can be observed on some platforms.
Can you add details why/how CPU cluster isn't aligned with IRQ
affinity? And how performance penalty is caused?
Is it caused by remote IO completion in blk_mq_complete_need_ipi()?
/* same CPU or cache domain and capacity? Complete locally */
if (cpu == rq->mq_ctx->cpu ||
(!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) &&
cpus_share_cache(cpu, rq->mq_ctx->cpu) &&
cpus_equal_capacity(cpu, rq->mq_ctx->cpu)))
return false;
If yes, which case you are addressing to? cache domain or capccity?
AMD's CCX shares L3 cache inside NUMA node, which has similar issue,
I guess this patchset may cover it?
> This patch improves IRQ affinity by grouping CPUs by cluster within each
> NUMA domain, ensuring better locality between CPUs and their assigned
> NVMe IRQs.
Will look into this patch, but I feel one easier way is to build
sub-node(cluster) cpumask array, and just spread over the sub-node(cluster).
Thanks,
Ming
More information about the Linux-nvme
mailing list