DMA pool lock contention

Thu Apr 17 10:06:11 PDT 2025

Hi Linux NVMe folks,
On a 32 KB NVMe passthru read workload, we see 2.4% of CPU time is
spent in _raw_spin_lock_irqsave called from dma_pool_alloc and
dma_pool_free. It looks like each NVMe command with 32 KB of data
allocates a PRP list from the per-nvme_dev prp_small_pool. And every
call to dma_pool_alloc/dma_pool_free takes the dma_pool's lock. Since
the workload submits commands to the same NVMe devices from many CPUs,
it's not surprising that these global spinlocks are a significant
source of contention. Is there anything that can be done to reduce
this bottleneck? Would it be possible to give each nvme_queue its own
dma_pool?

Thanks,
Caleb