[PATCH] iommu/arm-smmu-v3: allocate the memory of queues in local numa node
Will Deacon
will at kernel.org
Fri Jul 3 12:21:47 EDT 2020
On Mon, Jun 01, 2020 at 11:31:41PM +1200, Barry Song wrote:
> dmam_alloc_coherent() will usually allocate memory from the default CMA. For
> a common arm64 defconfig without reserved memory in device tree, there is only
> one CMA close to address 0.
> dma_alloc_contiguous() will allocate memory without any idea of NUMA and smmu
> has no customized per-numa cma_area.
> struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
> {
> size_t count = size >> PAGE_SHIFT;
> struct page *page = NULL;
> struct cma *cma = NULL;
>
> if (dev && dev->cma_area)
> cma = dev->cma_area;
> else if (count > 1)
> cma = dma_contiguous_default_area;
>
> ...
> return page;
> }
>
> if there are N numa nodes, N-1 nodes will put command/evt queues etc in a
> remote node the default CMA belongs to,probably node 0. Tests show, after
> sending CMD_SYNC in an empty command queue,
> on Node2, without this patch, it takes 550ns to wait for the completion
> of CMD_SYNC; with this patch, it takes 250ns to wait for the completion
> of CMD_SYNC.
>
> Signed-off-by: Barry Song <song.bao.hua at hisilicon.com>
> ---
> drivers/iommu/arm-smmu-v3.c | 63 ++++++++++++++++++++++++++++---------
> 1 file changed, 48 insertions(+), 15 deletions(-)
I would prefer that the coherent DMA allocator learned about NUMA, rather
than we bodge drivers to use the streaming API where it doesn't really
make sense.
I see that you've posted other patches to do that (thanks!), so I'll
disregard this series.
Cheers,
Will
More information about the linux-arm-kernel
mailing list