[PATCH] iommu/arm-smmu-v3: allocate the memory of queues in local numa node

Song Bao Hua (Barry Song) song.bao.hua at hisilicon.com
Sun Jul 5 06:09:38 EDT 2020



> -----Original Message-----
> From: Will Deacon [mailto:will at kernel.org]
> Sent: Saturday, July 4, 2020 4:22 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua at hisilicon.com>
> Cc: hch at lst.de; m.szyprowski at samsung.com; robin.murphy at arm.com;
> linux-arm-kernel at lists.infradead.org; iommu at lists.linux-foundation.org;
> Linuxarm <linuxarm at huawei.com>
> Subject: Re: [PATCH] iommu/arm-smmu-v3: allocate the memory of queues in
> local numa node
> 
> On Mon, Jun 01, 2020 at 11:31:41PM +1200, Barry Song wrote:
> > dmam_alloc_coherent() will usually allocate memory from the default CMA.
> For
> > a common arm64 defconfig without reserved memory in device tree, there is
> only
> > one CMA close to address 0.
> > dma_alloc_contiguous() will allocate memory without any idea of  NUMA
> and smmu
> > has no customized per-numa cma_area.
> > struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
> > {
> >         size_t count = size >> PAGE_SHIFT;
> >         struct page *page = NULL;
> >         struct cma *cma = NULL;
> >
> >         if (dev && dev->cma_area)
> >                 cma = dev->cma_area;
> >         else if (count > 1)
> >                 cma = dma_contiguous_default_area;
> >
> > 	...
> >         return page;
> > }
> >
> > if there are N numa nodes, N-1 nodes will put command/evt queues etc in a
> > remote node the default CMA belongs to,probably node 0. Tests show, after
> > sending CMD_SYNC in an empty command queue,
> > on Node2, without this patch, it takes 550ns to wait for the completion
> > of CMD_SYNC; with this patch, it takes 250ns to wait for the completion
> > of CMD_SYNC.
> >
> > Signed-off-by: Barry Song <song.bao.hua at hisilicon.com>
> > ---
> >  drivers/iommu/arm-smmu-v3.c | 63
> ++++++++++++++++++++++++++++---------
> >  1 file changed, 48 insertions(+), 15 deletions(-)
> 
> I would prefer that the coherent DMA allocator learned about NUMA, rather
> than we bodge drivers to use the streaming API where it doesn't really
> make sense.
> 
> I see that you've posted other patches to do that (thanks!), so I'll
> disregard this series.

Thanks for taking a look, Will. For sure I am using the per-numa cma patchset to
replace this patch. So it is ok to ignore this one.


> 
> Cheers,
> 
> Will

Thanks
Barry




More information about the linux-arm-kernel mailing list