[PATCHv2] NVMe: IO Queue NUMA locality
Matthew Wilcox
willy at linux.intel.com
Tue Jul 9 09:41:29 EDT 2013
On Mon, Jul 08, 2013 at 01:35:59PM -0600, Keith Busch wrote:
> There is measurable difference when running IO on a cpu on another
> domain; however, my particular device hits its peak performance on
> either domain at higher queue depths and block sizes, so I'm only able
> to see a difference at lower io depths. The best gains topped out at 2%
> improvement with this patch vs the existing code.
That's not too shabby. This is only a two-socket system you're testing
on, so I'd expect larger gains on systems with more sockets.
> I understand this method of allocating and mapping memory may not work
> for CPUs without cache-coherency, but I'm not sure if there is another
> way to allocate coherent memory for a specific NUMA node.
I found a way in the networking drivers:
int ixgbe_setup_tx_resources(struct ixgbe_ring *tx_ring)
{
int orig_node = dev_to_node(dev);
int numa_node = -1;
...
if (tx_ring->q_vector)
numa_node = tx_ring->q_vector->numa_node;
...
set_dev_node(dev, numa_node);
tx_ring->desc = dma_alloc_coherent(dev,
tx_ring->size,
&tx_ring->dma,
GFP_KERNEL);
set_dev_node(dev, orig_node);
if (!tx_ring->desc)
tx_ring->desc = dma_alloc_coherent(dev, tx_ring->size,
&tx_ring->dma, GFP_KERNEL);
if (!tx_ring->desc)
goto err;
> diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
> index 711b51c..9cedfa0 100644
> --- a/drivers/block/nvme-core.c
> +++ b/drivers/block/nvme-core.c
> @@ -1200,7 +1206,7 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
> if (result < 0)
> return result;
>
> - nvmeq = nvme_alloc_queue(dev, 0, 64, 0);
> + nvmeq = nvme_alloc_queue(dev, 0, 64, 0, -1);
> if (!nvmeq)
> return -ENOMEM;
>
I suppose we should really have the admin queue allocated on the node
closest to the device, so pass in dev_to_node(dev) instead of -1 here?
More information about the Linux-nvme
mailing list