[PATCH v3 0/5] dma-mapping: arm64: support batched cache sync

Marek Szyprowski m.szyprowski at samsung.com
Tue Mar 3 08:33:37 PST 2026


On 28.02.2026 23:11, Barry Song wrote:
> From: Barry Song <baohua at kernel.org>
>
> Many embedded ARM64 SoCs still lack hardware cache coherency support, which
> causes DMA mapping operations to appear as hotspots in on-CPU flame graphs.
>
> For an SG list with *nents* entries, the current dma_map/unmap_sg() and DMA
> sync APIs perform cache maintenance one entry at a time. After each entry,
> the implementation synchronously waits for the corresponding region’s
> D-cache operations to complete. On architectures like arm64, efficiency can
> be improved by issuing all entries’ operations first and then performing a
> single batched wait for completion.
>
> Tangquan's results show that batched synchronization can reduce
> dma_map_sg() time by 64.61% and dma_unmap_sg() time by 66.60% on an MTK
> phone platform (MediaTek Dimensity 9500). The tests were performed by
> pinning the task to CPU7 and fixing the CPU frequency at 2.6 GHz,
> running dma_map_sg() and dma_unmap_sg() on 10 MB buffers (10 MB / 4 KB
> sg entries per buffer) for 200 iterations and then averaging the
> results.
>
> Thanks to Xueyuan for volunteering to take on the testing tasks. He
> put significant effort into validating paths such as IOVA link/unlink
> and SWIOTLB on RK3588 boards with NVMe.

Catalin, Will, I would like to merge this to dma-mapping tree, give Your 
ack or comment if You are okay with ARM64 related parts.


> v3:
>   * Fold patches 5/8, 7/8, and 8/8 into patch 4/8 as suggested by Leon,
>     reducing the series from 8 patches to 5;
>   * Fix the SWIOTLB path by ensuring a sync is issued before memcpy;
>   * Add ARCH_HAS_BATCHED_DMA_SYNC Kconfig as suggested by Leon;
>   * Collect Reviewed-by tags from Leon and Juergen. Leon's tag is not
>     added to patch 4 since it has changed significantly since v2 and
>     requires re-review;
>   * Rename some asm macros and functions as suggested by Will;
>   * Add Xueyuan's Tested-by. His help is greatly appreciated!
>   v2 link:
>   https://lore.kernel.org/lkml/20251226225254.46197-1-21cnbao@gmail.com/
>
> v2:
>   * Refine a large amount of arm64 asm code based on feedback from
>     Robin, thanks!
>   * Drop batch_add APIs and always use arch_sync_dma_for_* + flush,
>     even for a single buffer, based on Leon’s suggestion, thanks!
>   * Refine a large amount of code based on feedback from Leon, thanks!
>   * Also add batch support for iommu_dma_sync_sg_for_{cpu,device}
> v1 link:
>   https://lore.kernel.org/lkml/20251219053658.84978-1-21cnbao@gmail.com/
>
> v1, diff with RFC:
>   * Drop a large number of #ifdef/#else/#endif blocks based on feedback
>     from Catalin and Marek, thanks!
>   * Also add batched iova link/unlink support, marked as RFC since I lack
>     the required hardware. This was suggested by Marek, thanks!
> RFC link:
>   https://lore.kernel.org/lkml/20251029023115.22809-1-21cnbao@gmail.com/
>
> Barry Song (5):
>    arm64: Provide dcache_by_myline_op_nosync helper
>    arm64: Provide dcache_clean_poc_nosync helper
>    arm64: Provide dcache_inval_poc_nosync helper
>    dma-mapping: Separate DMA sync issuing and completion waiting
>    dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg
>
>   arch/arm64/Kconfig                  |  1 +
>   arch/arm64/include/asm/assembler.h  | 25 ++++++++++---
>   arch/arm64/include/asm/cache.h      |  5 +++
>   arch/arm64/include/asm/cacheflush.h |  2 +
>   arch/arm64/kernel/relocate_kernel.S |  3 +-
>   arch/arm64/mm/cache.S               | 57 +++++++++++++++++++++++------
>   arch/arm64/mm/dma-mapping.c         |  4 +-
>   drivers/iommu/dma-iommu.c           | 35 ++++++++++++++----
>   drivers/xen/swiotlb-xen.c           | 24 ++++++++----
>   include/linux/dma-map-ops.h         |  6 +++
>   kernel/dma/Kconfig                  |  3 ++
>   kernel/dma/direct.c                 | 23 +++++++++---
>   kernel/dma/direct.h                 | 21 ++++++++---
>   kernel/dma/mapping.c                |  6 +--
>   kernel/dma/swiotlb.c                |  7 +++-
>   15 files changed, 171 insertions(+), 51 deletions(-)
>
> Cc: Leon Romanovsky <leon at kernel.org>
> Cc: Marek Szyprowski <m.szyprowski at samsung.com>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Will Deacon <will at kernel.org>
> Cc: Ada Couprie Diaz <ada.coupriediaz at arm.com>
> Cc: Ard Biesheuvel <ardb at kernel.org>
> Cc: Marc Zyngier <maz at kernel.org>
> Cc: Anshuman Khandual <anshuman.khandual at arm.com>
> Cc: Ryan Roberts <ryan.roberts at arm.com>
> Cc: Suren Baghdasaryan <surenb at google.com>
> Cc: Robin Murphy <robin.murphy at arm.com>
> Cc: Joerg Roedel <joro at 8bytes.org>
> Cc: Juergen Gross <jgross at suse.com>
> Cc: Stefano Stabellini <sstabellini at kernel.org>
> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko at epam.com>
> Cc: Tangquan Zheng <zhengtangquan at oppo.com>
> Cc: Huacai Zhou <zhouhuacai at oppo.com>
> Cc: Xueyuan Chen <xueyuan.chen21 at gmail.com>

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland




More information about the linux-arm-kernel mailing list