DMA pool lock contention

Thu Apr 17 10:31:18 PDT 2025

On Thu, Apr 17, 2025 at 10:06:11AM -0700, Caleb Sander Mateos wrote:
> Hi Linux NVMe folks,
> On a 32 KB NVMe passthru read workload, we see 2.4% of CPU time is
> spent in _raw_spin_lock_irqsave called from dma_pool_alloc and
> dma_pool_free. It looks like each NVMe command with 32 KB of data
> allocates a PRP list from the per-nvme_dev prp_small_pool. And every
> call to dma_pool_alloc/dma_pool_free takes the dma_pool's lock. Since
> the workload submits commands to the same NVMe devices from many CPUs,
> it's not surprising that these global spinlocks are a significant
> source of contention. Is there anything that can be done to reduce
> this bottleneck? Would it be possible to give each nvme_queue its own
> dma_pool?

That's a bit unfortunate to hear. The locked section to alloc and free
from the dma_pool used to be much more costly than what it is now. The
driver is still using it under a shared lock here at the end of the day,
though.

It's possible to give each queue their own pool, but there could be
thousands of queues in a system, so could get pretty resource intensive.
Maybe we just set some arbitrary max amount of pools and divvy them up
among the CPUs.