[PATCH v2 4/8] dma-mapping: Separate DMA sync issuing and completion waiting
Marek Szyprowski
m.szyprowski at samsung.com
Wed Dec 31 06:43:19 PST 2025
On 28.12.2025 22:38, Barry Song wrote:
> On Mon, Dec 29, 2025 at 3:49 AM Leon Romanovsky <leon at kernel.org> wrote:
>> On Sun, Dec 28, 2025 at 10:45:13AM +1300, Barry Song wrote:
>>> On Sun, Dec 28, 2025 at 9:07 AM Leon Romanovsky <leon at kernel.org> wrote:
>>>> On Sat, Dec 27, 2025 at 11:52:44AM +1300, Barry Song wrote:
>>>>> From: Barry Song <baohua at kernel.org>
>>>>>
>>>>> Currently, arch_sync_dma_for_cpu and arch_sync_dma_for_device
>>>>> always wait for the completion of each DMA buffer. That is,
>>>>> issuing the DMA sync and waiting for completion is done in a
>>>>> single API call.
>>>>>
>>>>> For scatter-gather lists with multiple entries, this means
>>>>> issuing and waiting is repeated for each entry, which can hurt
>>>>> performance. Architectures like ARM64 may be able to issue all
>>>>> DMA sync operations for all entries first and then wait for
>>>>> completion together.
>>>>>
>>>>> To address this, arch_sync_dma_for_* now issues DMA operations in
>>>>> batch, followed by a flush. On ARM64, the flush is implemented
>>>>> using a dsb instruction within arch_sync_dma_flush().
>>>>>
>>>>> For now, add arch_sync_dma_flush() after each
>>>>> arch_sync_dma_for_*() call. arch_sync_dma_flush() is defined as a
>>>>> no-op on all architectures except arm64, so this patch does not
>>>>> change existing behavior. Subsequent patches will introduce true
>>>>> batching for SG DMA buffers.
>>>>>
>>>>> Cc: Leon Romanovsky <leon at kernel.org>
>>>>> Cc: Catalin Marinas <catalin.marinas at arm.com>
>>>>> Cc: Will Deacon <will at kernel.org>
>>>>> Cc: Marek Szyprowski <m.szyprowski at samsung.com>
>>>>> Cc: Robin Murphy <robin.murphy at arm.com>
>>>>> Cc: Ada Couprie Diaz <ada.coupriediaz at arm.com>
>>>>> Cc: Ard Biesheuvel <ardb at kernel.org>
>>>>> Cc: Marc Zyngier <maz at kernel.org>
>>>>> Cc: Anshuman Khandual <anshuman.khandual at arm.com>
>>>>> Cc: Ryan Roberts <ryan.roberts at arm.com>
>>>>> Cc: Suren Baghdasaryan <surenb at google.com>
>>>>> Cc: Joerg Roedel <joro at 8bytes.org>
>>>>> Cc: Juergen Gross <jgross at suse.com>
>>>>> Cc: Stefano Stabellini <sstabellini at kernel.org>
>>>>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko at epam.com>
>>>>> Cc: Tangquan Zheng <zhengtangquan at oppo.com>
>>>>> Signed-off-by: Barry Song <baohua at kernel.org>
>>>>> ---
>>>>> arch/arm64/include/asm/cache.h | 6 ++++++
>>>>> arch/arm64/mm/dma-mapping.c | 4 ++--
>>>>> drivers/iommu/dma-iommu.c | 37 +++++++++++++++++++++++++---------
>>>>> drivers/xen/swiotlb-xen.c | 24 ++++++++++++++--------
>>>>> include/linux/dma-map-ops.h | 6 ++++++
>>>>> kernel/dma/direct.c | 8 ++++++--
>>>>> kernel/dma/direct.h | 9 +++++++--
>>>>> kernel/dma/swiotlb.c | 4 +++-
>>>>> 8 files changed, 73 insertions(+), 25 deletions(-)
>>>> <...>
>>>>
>>>>> +#ifndef arch_sync_dma_flush
>>>>> +static inline void arch_sync_dma_flush(void)
>>>>> +{
>>>>> +}
>>>>> +#endif
>>>> Over the weekend I realized a useful advantage of the ARCH_HAVE_* config
>>>> options: they make it straightforward to inspect the entire DMA path simply
>>>> by looking at the .config.
>>> I am not quite sure how much this benefits users, as the same
>>> information could also be obtained by grepping for
>>> #define arch_sync_dma_flush in the source code.
>> It differs slightly. Users no longer need to grep around or guess whether this
>> platform used the arch_sync_dma_flush path. A simple grep for ARCH_HAVE_ in
>> /proc/config.gz provides the answer.
> In any case, it is only two or three lines of code, so I am fine with
> either approach. Perhaps Marek, Robin, and others have a point here?
If possible I would suggest to follow the already used style in the
given code even if it means a bit larger patch.
>>>> Thanks,
>>>> Reviewed-by: Leon Romanovsky <leonro at nvidia.com>
>>> Thanks very much, Leon, for reviewing this over the weekend. One thing
>>> you might have missed is that I place arch_sync_dma_flush() after all
>>> arch_sync_dma_for_*() calls, for both single and sg cases. I also
>>> used a Python script to scan the code and verify that every
>>> arch_sync_dma_for_*() is followed by arch_sync_dma_flush(), to ensure
>>> that no call is left out.
>>>
>>> In the subsequent patches, for sg cases, the per-entry flush is
>>> replaced by a single flush of the entire sg. Each sg case has
>>> different characteristics: some are straightforward, while others
>>> can be tricky and involve additional contexts.
>> I didn't overlook it, and I understand your rationale. However, this is
>> not how kernel patches should be structured. You should not introduce
>> code in patch X and then move it elsewhere in patch X + Y.
> I am not quite convinced by this concern. This patch only
> separates DMA sync issuing from completion waiting, and it
> reflects that the development is done step by step.
>
>> Place the code in the correct location from the start. Your patches are
>> small enough to review as is.
> My point is that this patch places the code in the correct locations
> from the start. It splits arch_sync_dma_for_*() into
> arch_sync_dma_for_*() plus arch_sync_dma_flush() everywhere, without
> introducing any functional changes from the outset.
> The subsequent patches clearly show which parts are truly batched.
>
> In the meantime, I do not have a strong preference here. If you think
> it is better to move some of the straightforward batching code here,
> I can follow that approach. Perhaps I could move patch 5, patch 8,
> and the iommu_dma_iova_unlink_range_slow change from patch 7 here,
> while keeping
>
> [PATCH 6] dma-mapping: Support batch mode for
> dma_direct_{map,unmap}_sg
>
> and the IOVA link part from patch 7 as separate patches, since that
> part is not straightforward. The IOVA link changes affect both
> __dma_iova_link() and dma_iova_sync(), which are two separate
> functions and require a deeper understanding of the contexts to
> determine correctness. That part also lacks testing.
>
> Would that be okay with you?
Yes, this will be okay. The changes are easy to understand, so we don't
need to go there with such very small steps.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
More information about the linux-arm-kernel
mailing list