[PATCH] arm64/dma-mapping: Fix arch_sync_dma_for_device to respect dir parameter
Catalin Marinas
catalin.marinas at arm.com
Wed Aug 20 06:25:27 PDT 2025
On Wed, Aug 20, 2025 at 11:28:06AM +0100, John Cox via B4 Relay wrote:
> All other architectures do different cache operations depending on the
> dir parameter. Fix arm64 to do the same.
I suspect that's a bug in the users of the DMA API. We shouldn't modify
the arm64 implementation to cope with them.
> This fixes udmabuf operations when syncing for read e.g. when the CPU
> reads back a V4L2 decoded frame buffer.
>
> Signed-off-by: John Cox <john.cox at raspberrypi.com>
> ---
> This patch makes the arch_sync_dma_for_device function on arm64
> do different things depending on the value of the dir parameter. In
> particular it does a cache invalidate operation if the dir flag is
> set to DMA_FROM_DEVICE. The current code does a writeback without
> invalidate under all circumstances. Nearly all other architectures do
> an invalidate if the direction is FROM_DEVICE which seems like the
> correct thing to do to me.
So does arm64 but in the arch_sync_dma_for_cpu(). That's the correct
place to do it, otherwise after arch_sync_dma_for_device() you may have
speculative loads by the CPU populating the caches with stale data
before the device finished writing.
> This patch fixes a problem I was having with udmabuf allocated
> dmabufs. It also fixes a very similar problem I had with dma_heap
> allocated dmabuf but that occured very much less frequently and I
> haven't traced exactly what was going on there.
>
> My problem (on a Raspberry Pi5):
>
> [Userland]
> Alloc memory with memfd_create + ftruncate
> Derive dmabuf from memfd with udmabuf
> Close memfd
> Queue dmabuf into V4L2 with QBUF
> <decode a video frame>
> Extract dmabuf from V4L2 with DQBUF
> Map dmabuf for read with mmap
> Sync for read with DMA_BUF_IOCTL_SYNC with (DMA_BUF_SYNC_START |
> DMA_BUF_SYNC_READ)
> Read buffer
> Sync end
> Unmap
Between the device writing to the buffer and the "read buffer" step
above, is there a call to arch_sync_dma_for_cpu()? A quick look at
begin_cpu_udmabuf() shows a dma_sync_sgtable_for_cpu(), though there is
a branch where this is skipped. get_sg_table() seems to do a DMA map
which I think ends up in arch_sync_dma_for_device() but the sync
for-CPU is skipped.
An attempt to a udmabuf fix (untested):
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 40399c26e6be..9ab4a6c01143 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -256,10 +256,11 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
ret = PTR_ERR(ubuf->sg);
ubuf->sg = NULL;
}
- } else {
- dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
}
+ if (ubuf->sg)
+ dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
+
return ret;
}
> I get old (zero) data out of the "Read buffer" stage in some cache
> lines sometimes.
> It doesn't matter which way round the mmap & sync are.
>
> I am aware that there is a patchset going through for udmabuf that may
> well fix the udmabuf case above, but given that this patch fixes
> something similar in dma_heap/system too I think it is still worth
> having.
> ---
> arch/arm64/mm/dma-mapping.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index b2b5792b2caaf81ccfc3204c94395bb0faeabddd..51c43c1f563015139e365ed86f0f5f0d9483fa7f 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -16,8 +16,22 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
> enum dma_data_direction dir)
> {
> unsigned long start = (unsigned long)phys_to_virt(paddr);
> + unsigned long end = start + size;
>
> - dcache_clean_poc(start, start + size);
> + switch (dir) {
> + case DMA_BIDIRECTIONAL:
> + dcache_clean_inval_poc(start, end);
> + break;
> + case DMA_TO_DEVICE:
> + dcache_clean_poc(start, end);
> + break;
> + case DMA_FROM_DEVICE:
> + dcache_inval_poc(start, end);
> + break;
> + case DMA_NONE:
> + default:
> + break;
> + }
> }
As explained above, that's not the right fix. We need to identify what's
missing on the ioctl() paths.
--
Catalin
More information about the linux-arm-kernel
mailing list