[PATCH 05/13] iommu/io-pgtable-arm: Allow appropriate DMA API use
Robin Murphy
robin.murphy at arm.com
Tue Aug 4 07:47:13 PDT 2015
Hi Laurent,
[ +RMK, as his patch is indirectly involved here ]
On 04/08/15 14:16, Laurent Pinchart wrote:
> Hi Will and Robin,
>
> Thank you for the patch.
>
> On Monday 03 August 2015 14:25:47 Will Deacon wrote:
>> From: Robin Murphy <Robin.Murphy at arm.com>
>>
>> Currently, users of the LPAE page table code are (ab)using dma_map_page()
>> as a means to flush page table updates for non-coherent IOMMUs. Since
>> from the CPU's point of view, creating IOMMU page tables *is* passing
>> DMA buffers to a device (the IOMMU's page table walker), there's little
>> reason not to use the DMA API correctly.
>>
>> Allow IOMMU drivers to opt into DMA API operations for page table
>> allocation and updates by providing their appropriate device pointer.
>> The expectation is that an LPAE IOMMU should have a full view of system
>> memory, so use streaming mappings to avoid unnecessary pressure on
>> ZONE_DMA, and treat any DMA translation as a warning sign.
>
> I like the idea of doing this in core code rather than in individual drivers,
> but I believe we're not using the right API. Please see below.
Perhaps this could have one of my trademark "for now"s - the aim of this
series is really just to stop the per-driver hacks proliferating, as per
Russell's comment[1]. I left the semi-artificial DMA==phys restriction
for expediency, since it follows the current usage and keeps the code
changes minimal. With this series in place I'd be happy to go back and
try a full-blown DMA conversion if and when a real need shows up, but I
think it would significantly complicate all the current software page
table walking.
>> Signed-off-by: Robin Murphy <robin.murphy at arm.com>
>> Signed-off-by: Will Deacon <will.deacon at arm.com>
>> ---
>> drivers/iommu/Kconfig | 3 +-
>> drivers/iommu/io-pgtable-arm.c | 107 +++++++++++++++++++++++++++++---------
>> drivers/iommu/io-pgtable.h | 3 ++
>> 3 files changed, 89 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> index f1fb1d3ccc56..d77a848d50de 100644
>> --- a/drivers/iommu/Kconfig
>> +++ b/drivers/iommu/Kconfig
>> @@ -23,7 +23,8 @@ config IOMMU_IO_PGTABLE
>> config IOMMU_IO_PGTABLE_LPAE
>> bool "ARMv7/v8 Long Descriptor Format"
>> select IOMMU_IO_PGTABLE
>> - depends on ARM || ARM64 || COMPILE_TEST
>> + # SWIOTLB guarantees a dma_to_phys() implementation
>> + depends on ARM || ARM64 || (COMPILE_TEST && SWIOTLB)
>> help
>> Enable support for the ARM long descriptor pagetable format.
>> This allocator supports 4K/2M/1G, 16K/32M and 64K/512M page
>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>> index 4e460216bd16..28cca8a652f9 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -200,12 +200,76 @@ typedef u64 arm_lpae_iopte;
>>
>> static bool selftest_running = false;
>>
>> +static dma_addr_t __arm_lpae_dma_addr(struct device *dev, void *pages)
>> +{
>> + return phys_to_dma(dev, virt_to_phys(pages));
>> +}
>> +
>> +static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
>> + struct io_pgtable_cfg *cfg)
>> +{
>> + struct device *dev = cfg->iommu_dev;
>> + dma_addr_t dma;
>> + void *pages = alloc_pages_exact(size, gfp | __GFP_ZERO);
>> +
>> + if (!pages)
>> + return NULL;
>> +
>> + if (dev) {
>> + dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE);
>> + if (dma_mapping_error(dev, dma))
>> + goto out_free;
>> + /*
>> + * We depend on the IOMMU being able to work with any physical
>> + * address directly, so if the DMA layer suggests it can't by
>> + * giving us back some translation, that bodes very badly...
>> + */
>> + if (dma != __arm_lpae_dma_addr(dev, pages))
>> + goto out_unmap;
>
> Why do we need to create a mapping at all then ? Because
> dma_sync_single_for_device() requires it ?
We still need to expose this new table to the device. If we never update
it, we'll never have reason to call dma_sync_, but we definitely want
the IOMMU to know there's a page of invalid PTEs there.
>> + }
>> +
>> + return pages;
>> +
>> +out_unmap:
>> + dev_err(dev, "Cannot accommodate DMA translation for IOMMU page
>> tables\n");
>> + dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
>> +out_free:
>> + free_pages_exact(pages, size);
>> + return NULL;
>> +}
>> +
>> +static void __arm_lpae_free_pages(void *pages, size_t size,
>> + struct io_pgtable_cfg *cfg)
>> +{
>> + struct device *dev = cfg->iommu_dev;
>> +
>> + if (dev)
>> + dma_unmap_single(dev, __arm_lpae_dma_addr(dev, pages),
>> + size, DMA_TO_DEVICE);
>> + free_pages_exact(pages, size);
>> +}
>> +
>> +static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
>> + struct io_pgtable_cfg *cfg, void *cookie)
>> +{
>> + struct device *dev = cfg->iommu_dev;
>> +
>> + *ptep = pte;
>> +
>> + if (dev)
>> + dma_sync_single_for_device(dev, __arm_lpae_dma_addr(dev, ptep),
>> + sizeof(pte), DMA_TO_DEVICE);
>
> This is what I believe to be an API abuse. The dma_sync_single_for_device()
> API is meant to pass ownership of a buffer to the device. Unless I'm mistaken,
> once that's done the CPU isn't allowed to touch the buffer anymore until
> dma_sync_single_for_cpu() is called to get ownership of the buffer back. Sure,
> it might work on many ARM systems, but we really should be careful not to use
> APIs as delicate as DMA mapping and cache handling for purposes different than
> what they explicitly allow.
>
> It might be that I'm wrong and that the streaming DMA API allows this exact
> kind of usage, but I haven't found a clear indication of that in the
> documentation. It could also be that all implementations would support it
> today, and that we would then consider it should be explicitly allowed by the
> API. In both cases a documentation patch would be welcome.
TBH, I was largely going by Russell's Tegra patch, which similarly
elides any sync_*_for_cpu. In reality, since everything is DMA_TO_DEVICE
and the IOMMU itself can't modify the page tables[3], I can't think of
any situation where sync_*_for_cpu would actually do anything.
From reading this part of DMA-API.txt:
Notes: You must do this:
- Before reading values that have been written by DMA from the device
(use the DMA_FROM_DEVICE direction)
- After writing values that will be written to the device using DMA
(use the DMA_TO_DEVICE) direction
- before *and* after handing memory to the device if the memory is
DMA_BIDIRECTIONAL
I would conclude that since a sync using DMA_TO_DEVICE *before* writing
is not a "must", then it's probably unnecessary.
Robin.
[1]:http://article.gmane.org/gmane.linux.kernel/2005551
[2]:http://article.gmane.org/gmane.linux.ports.tegra/23150
[3]:Yes, there may generally be exceptions to that, but not in the
context of this code. Unless the Renesas IPMMU does something I don't
know about?
More information about the linux-arm-kernel
mailing list