[RFC PATCH v2 2/3] arm64: add IOMMU dma_ops
robin.murphy at arm.com
Mon Mar 9 13:09:49 PDT 2015
On 09/03/15 17:59, Russell King - ARM Linux wrote:
>>>> For a noncoherent device, dma_map_single() will end up calling
>>> __dma_map_area() with the page offset and size of the original request, so
>>> the updated part gets flushed by VA, and the rest of the page isn't touched
>>> if it doesn't need to be. On the other hand if the page tables were
>>> allocated with dma_alloc_coherent() in the first place, then just calling
>>> dma_sync_single_for_device() for the updated region should suffice.
> That's wrong. dma_sync_single_*() is not permitted to be called on
> coherently allocated memory. Where coherent memory needs to be remapped,
> dma_sync_single_*() will panic the kernel.
> If it's in coherent memory, all you should need is the appropriate
> memory barrier to ensure that the DMA agent can see the writes.
You're quite right, that's the whole point of *coherent* allocations
after all. I got my syncs and barriers muddled there.
>>> Where exactly would you call the dma_unmap? It seems a bit strange to
>>> be repeatedly calling dma_map and never calling dma_unmap. I don't see it
>>> explicitly forbidden in the docs anywhere to do this but it seems like
>>> it would be violating the implicit handoff of dma_map/dma_unmap.
>> I think ideally you'd call dma_map_page when you first create the page
>> table, dma_sync_single_for_device on any update, and dma_unmap_page when you
>> tear it down, and you'd also use the appropriate DMA addresses everywhere
>> instead of physical addresses.
> dma_map_page() ownership changes CPU->DMA
> dma_sync_single_for_cpu() ownership changes DMA->CPU
> dma_sync_single_for_device() ownership changes CPU->DMA
> dma_unmap_page() ownership changes DMA->CPU
> It's invalid to miss out the pairing that give those ownership changes.
Thanks for the clarification - the wording in DMA-API.txt rather implies
that in the DMA_TO_DEVICE case you only have to sync the updated data
/after/ writing it. For the sake of purely getting pages flushed, would
it be more reasonable then to call dma_map_single() followed immediately
by dma_unmap_single_attrs() with DMA_ATTR_SKIP_CPU_SYNC? Since we know
the IOMMU can never write back to memory (ones that can are a different
issue) it would be nice to be able to skip the extra invalidations
somehow, without too heinously abusing the API.
More information about the linux-arm-kernel