[RFC] Describing arbitrary bus mastering relationships in DT

Grant Grundler grundler at chromium.org
Mon May 12 13:02:13 PDT 2014

On Mon, May 12, 2014 at 11:29 AM, Stephen Warren <swarren at wwwdotorg.org> wrote:
>> But the important point here is that you wouldn't use the dma-mapping
>> API to manage this. First of all, the CPU is special anyway, but also
>> if you do a device-to-device DMA into the GPU address space and that
>> ends up being redirected to memory through the IOMMU, you still wouldn't
>> manage the I/O page tables through the interfaces of the device doing the
>> DMA, but through some private interface of the GPU.
> Why not? If something wants to DMA to a memory region, irrespective of
> whether the GPU MMU (or any MMU) is in between those master transactions
> and the RAM or not, surely the driver should always use the DMA mapping
> API to set that up?

No.  As one of the contributors to DMA API, I'm pretty confident it's
not. It _could_ be used that way but it's certainly not the original
design. P2P transactions are different since they are "less likely"
(depends on arch and implementation) to participate in the CPU cache
coherency or even be visible to the CPU. In particular, think of case
where all transactions are locally routed behind a PCI bridge (or
other fabric) and CPU/IOMMU/RAM controller never sees those.

A long standing real example is in drivers/scsi/sym53c8xx_2 driver.
The "scripts" engine needs to access local (on chip) RAM through PCI
bus transactions. So it uses it's own PCI BAR registers to sort that
In essence, "local PCI physical" addresses.  I believe the code is in
sym_iomap_device(). No CPU or IOMMU is involved with this.  This
driver otherwise uses the DMA API for all other host RAM accesses.

> Anything else just means using custom APIs, and
> isn't the whole point of the DMA mapping API to provide a standard API
> for that purpose?

yes and no. Yes, the generic DMA API is to provide DMA mapping
services to hide the (or lack of) IOMMU AND provide Cache Coherency
for DMA transactions to RAM that is visible to the CPU cache.

In general, I'd argue transactions that route through an IOMMU need to
work with the existing DMA API. Historically those transactions are
routed "upstream" - away from other IO devices and thus not the case
referred to here.

If the IOMMU is part of a "graph topology" (vs a tree topology), the
drivers will have to know if they use DMA API or not to access the
intended target.


More information about the linux-arm-kernel mailing list