[RFC PATCH 04/30] iommu/arm-smmu-v3: Add support for PCI ATS

Jean-Philippe Brucker jean-philippe.brucker at arm.com
Tue May 23 04:21:51 PDT 2017


On 23/05/17 09:41, Leizhen (ThunderTown) wrote:
> On 2017/2/28 3:54, Jean-Philippe Brucker wrote:
>> PCIe devices can implement their own TLB, named Address Translation Cache
>> (ATC). Steps involved in the use and maintenance of such caches are:
>>
>> * Device sends an Address Translation Request for a given IOVA to the
>>   IOMMU. If the translation succeeds, the IOMMU returns the corresponding
>>   physical address, which is stored in the device's ATC.
>>
>> * Device can then use the physical address directly in a transaction.
>>   A PCIe device does so by setting the TLP AT field to 0b10 - translated.
>>   The SMMU might check that the device is allowed to send translated
>>   transactions, and let it pass through.
>>
>> * When an address is unmapped, CPU sends a CMD_ATC_INV command to the
>>   SMMU, that is relayed to the device.
>>
>> In theory, this doesn't require a lot of software intervention. The IOMMU
>> driver needs to enable ATS when adding a PCI device, and send an
>> invalidation request when unmapping. Note that this invalidation is
>> allowed to take up to a minute, according to the PCIe spec. In
>> addition, the invalidation queue on the ATC side is fairly small, 32 by
>> default, so we cannot keep many invalidations in flight (see ATS spec
>> section 3.5, Invalidate Flow Control).
>>
>> Handling these constraints properly would require to postpone
>> invalidations, and keep the stale mappings until we're certain that all
>> devices forgot about them. This requires major work in the page table
>> managers, and is therefore not done by this patch.
>>
>>   Range calculation
>>   -----------------
>>
>> The invalidation packet itself is a bit awkward: range must be naturally
>> aligned, which means that the start address is a multiple of the range
>> size. In addition, the size must be a power of two number of 4k pages. We
>> have a few options to enforce this constraint:
>>
>> (1) Find the smallest naturally aligned region that covers the requested
>>     range. This is simple to compute and only takes one ATC_INV, but it
>>     will spill on lots of neighbouring ATC entries.
>>
>> (2) Align the start address to the region size (rounded up to a power of
>>     two), and send a second invalidation for the next range of the same
>>     size. Still not great, but reduces spilling.
>>
>> (3) Cover the range exactly with the smallest number of naturally aligned
>>     regions. This would be interesting to implement but as for (2),
>>     requires multiple ATC_INV.
>>
>> As I suspect ATC invalidation packets will be a very scarce resource,
>> we'll go with option (1) for now, and only send one big invalidation.
>>
>> Note that with io-pgtable, the unmap function is called for each page, so
>> this doesn't matter. The problem shows up when sharing page tables with
>> the MMU.
> Suppose this is true, I'd like to choose option (2). Because the worst cases of
> both (1) and (2) will not be happened, but the code of (2) will look clearer.
> And (2) is technically more acceptable.

I agree that (2) is a bit clearer, but the question is of performance
rather than readability. I'd like to see some benchmarks or experiment on
my own before switching to a two-invalidation system.

Intuitively one big invalidation will result in more ATC trashing and will
bring overall device performance down. But then according to the PCI spec,
ATC invalidations are grossly expensive, they have an upper bound of a
minute. I agree that this is highly improbable and might depend on the
range size, but purely from an architectural standpoint, reducing the
number of ATC invalidation requests is the priority, because this is much
worse than any performance slow-down incurred by ATC trashing. And for the
moment I can only base my decisions on the architecture.

So I'd like to keep (1) for now, and update it to (2) (or even (3)) once
we have more hardware to experiment with.

Thanks,
Jean




More information about the linux-arm-kernel mailing list