[PATCH] iommu/arm-smmu-v3: allocate the memory of queues in local numa node

Will Deacon will at kernel.org
Fri Jul 3 12:21:47 EDT 2020


On Mon, Jun 01, 2020 at 11:31:41PM +1200, Barry Song wrote:
> dmam_alloc_coherent() will usually allocate memory from the default CMA. For
> a common arm64 defconfig without reserved memory in device tree, there is only
> one CMA close to address 0.
> dma_alloc_contiguous() will allocate memory without any idea of  NUMA and smmu
> has no customized per-numa cma_area.
> struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
> {
>         size_t count = size >> PAGE_SHIFT;
>         struct page *page = NULL;
>         struct cma *cma = NULL;
> 
>         if (dev && dev->cma_area)
>                 cma = dev->cma_area;
>         else if (count > 1)
>                 cma = dma_contiguous_default_area;
> 
> 	...
>         return page;
> }
> 
> if there are N numa nodes, N-1 nodes will put command/evt queues etc in a
> remote node the default CMA belongs to,probably node 0. Tests show, after
> sending CMD_SYNC in an empty command queue,
> on Node2, without this patch, it takes 550ns to wait for the completion
> of CMD_SYNC; with this patch, it takes 250ns to wait for the completion
> of CMD_SYNC.
> 
> Signed-off-by: Barry Song <song.bao.hua at hisilicon.com>
> ---
>  drivers/iommu/arm-smmu-v3.c | 63 ++++++++++++++++++++++++++++---------
>  1 file changed, 48 insertions(+), 15 deletions(-)

I would prefer that the coherent DMA allocator learned about NUMA, rather
than we bodge drivers to use the streaming API where it doesn't really
make sense.

I see that you've posted other patches to do that (thanks!), so I'll
disregard this series.

Cheers,

Will



More information about the linux-arm-kernel mailing list