[PATCH 0/6] dma-mapping: arm64: support batched cache sync
Barry Song
21cnbao at gmail.com
Thu Dec 18 22:12:28 PST 2025
It is unclear why, but the cover letter was missed in the
initial posting, even though Gmail shows it as sent. I am
resending it here as a reply to check whether it appears on
the mailing list. Apologies for the inconvenience.
On Fri, Dec 19, 2025 at 1:37 PM Barry Song <21cnbao at gmail.com> wrote:
>
> From: Barry Song <v-songbaohua at oppo.com>
>
> Many embedded ARM64 SoCs still lack hardware cache coherency support, which
> causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.
>
> For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
> sync APIs perform cache maintenance one entry at a time. After each entry,
> the implementation synchronously waits for the corresponding region’s
> D-cache operations to complete. On architectures like arm64, efficiency can
> be improved by issuing all entries’ operations first and then performing a
> single batched wait for completion.
>
> Tangquan's results show that batched synchronization can reduce
> dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
> phone platform (MediaTek Dimensity 9500). The tests were performed by
> pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
> running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
> sg entries per buffer) for 200 iterations and then averaging the
> results.
>
> I also ran this patch set on an RK3588 Rock5B+ board and
> observed that millions of DMA sync operations were batched.
>
> diff with RFC:
> * Dropped lots of #ifdef/#else/#endif according to Catalin and Marek,
> thanks!
> * Also add iova link/unlink batches, which is marked as RFC as i lack
> hardware. This is suggested by Marek, thanks!
>
> RFC link:
> https://lore.kernel.org/lkml/20251029023115.22809-1-21cnbao@gmail.com/
>
> Barry Song (6):
> arm64: Provide dcache_by_myline_op_nosync helper
> arm64: Provide dcache_clean_poc_nosync helper
> arm64: Provide dcache_inval_poc_nosync helper
> arm64: Provide arch_sync_dma_ batched helpers
> dma-mapping: Allow batched DMA sync operations if supported by the
> arch
> dma-iommu: Allow DMA sync batching for IOVA link/unlink
>
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/assembler.h | 79 +++++++++++++++++++-------
> arch/arm64/include/asm/cacheflush.h | 2 +
> arch/arm64/mm/cache.S | 58 +++++++++++++++----
> arch/arm64/mm/dma-mapping.c | 24 ++++++++
> drivers/iommu/dma-iommu.c | 12 +++-
> include/linux/dma-map-ops.h | 22 ++++++++
> kernel/dma/Kconfig | 3 +
> kernel/dma/direct.c | 28 +++++++---
> kernel/dma/direct.h | 86 +++++++++++++++++++++++++----
> 10 files changed, 262 insertions(+), 53 deletions(-)
>
> --
> 2.39.3 (Apple Git-146)
>
More information about the linux-arm-kernel
mailing list