[PATCH] iommu/dma: Stop getting dma_32bit_pfn wrong

Tue Nov 15 08:22:39 PST 2016

On 15/11/16 11:49, Joerg Roedel wrote:
> On Fri, Nov 11, 2016 at 06:30:45PM +0000, Robin Murphy wrote:
>> iommu_dma_init_domain() was originally written under the misconception
>> that dma_32bit_pfn represented some sort of size limit for IOVA domains.
>> Since the truth is almost the exact opposite of that, rework the logic
>> and comments to reflect its real purpose of optimising lookups when
>> allocating from a subset of the available space.
>>
>> Signed-off-by: Robin Murphy <robin.murphy at arm.com>
>> ---
>>  drivers/iommu/dma-iommu.c | 23 ++++++++++++++++++-----
>>  1 file changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index c5ab8667e6f2..ae045a14b530 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -139,6 +139,7 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>>  {
>>  	struct iova_domain *iovad = cookie_iovad(domain);
>>  	unsigned long order, base_pfn, end_pfn;
>> +	bool pci = dev && dev_is_pci(dev);
>>  
>>  	if (!iovad)
>>  		return -ENODEV;
>> @@ -161,19 +162,31 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>>  		end_pfn = min_t(unsigned long, end_pfn,
>>  				domain->geometry.aperture_end >> order);
>>  	}
>> +	/*
>> +	 * PCI devices may have larger DMA masks, but still prefer allocating
>> +	 * within a 32-bit mask to avoid DAC addressing. Such limitations don't
>> +	 * apply to the typical platform device, so for those we may as well
>> +	 * leave the cache limit at the top of the range they're likely to use.
>> +	 */
>> +	if (pci)
>> +		end_pfn = min_t(unsigned long, end_pfn,
>> +				DMA_BIT_MASK(32) >> order);
> 
> Question, does it actually hurt platform devices to follow the same
> allocation strategy as pci devices? I mean, does it hurt enough to
> special-case the code here?

It hurts them in the sense that they get absolutely no benefit from
trying a lower limit first (the device simply has as many address lines
wired to the interconnect as it needs/can drive), therefore there's
nothing to offset the cost of every allocation becoming
try-fail-and-try-again once <32-bits fills up with, say, big GPU
mappings. The maintenance cost of a couple of one-liner if statements
doesn't seem big enough to prevent a significant set of devices - bear
in mind that the first products really using this code in anger don't
have PCI at all (MT8173) - from having straightforward and optimal
allocation behaviour all the time, every time.

I suppose it might bear saying that as a Cambridge Water customer, I do
tend to view PCI in the same light as USB/I2C/etc. as "just another
external bus controller on the outside edge of 'the system'". From that
perspective, avoiding DAC overhead really does look like the minority
special case, although I appreciate that the view from the x86 angle is
rather different. I also suspect that, whether we want to or not, we may
continue to see more on-chip 'PCI' devices which don't really need this
either (because they're really just 'platform' devices talking directly
to whatever coherent interconnect with a config space bolted on somewhere).

Robin.