[PATCH 0/7] Add PCI ATS support to SMMUv3

Jean-Philippe Brucker jean-philippe.brucker at arm.com
Thu Jun 1 05:23:41 PDT 2017


On 31/05/17 16:27, Nate Watterson wrote:
> Hi Jean-Philippe,
> 
> On 5/24/2017 2:01 PM, Jean-Philippe Brucker wrote:
>> PCIe devices can implement their own TLB, named Address Translation Cache
>> (ATC). In order to support Address Translation Service (ATS), the
>> following changes are needed in software:
>>
>> * Enable ATS on endpoints when the system supports it. Both PCI root
>>    complex and associated SMMU must implement the ATS protocol.
>>
>> * When unmapping an IOVA, send an ATC invalidate request to the endpoint
>>    in addition to the usual SMMU IOTLB invalidations.
>>
>> I previously sent this as part of a lengthy RFC [1] adding SVM (ATS +
>> PASID + PRI) support to SMMUv3. The next PASID/PRI version is almost
>> ready, but isn't likely to get merged because it needs hardware testing,
>> so I will send it later. PRI depends on ATS, but ATS should be useful on
>> its own.
>>
>> Without PASID and PRI, ATS is used for accelerating transactions. Instead
>> of having all memory accesses go through SMMU translation, the endpoint
>> can translate IOVA->PA once, store the result in its ATC, then issue
>> subsequent transactions using the PA, partially bypassing the SMMU. So in
>> theory it should be faster while keeping the advantages of an IOMMU,
>> namely scatter-gather and access control.
>>
>> The ATS patches can now be tested on some hardware, even though the lack
>> of compatible PCI endpoints makes it difficult to assess what performance
>> optimizations we need. That's why the ATS implementation is a bit rough at
>> the moment, and we will work on optimizing things like invalidation ranges
>> later.
> 
> Sinan and I have tested this series on a QDF2400 development platform
> using a PCIe exerciser card as the ATS capable endpoint. We were able
> to verify that ATS requests complete with a valid translated address
> and that DMA transactions using the pre-translated address "bypass"
> the SMMU. Testing ATC invalidations was a bit more difficult as we
> could not figure out how to get the exerciser card to automatically
> send the completion message. We ended up having to write a debugger
> script that would monitor the CMDQ and tell the exerciser to send
> the completion when a hanging CMD_SYNC following a CMD_ATC_INV was
> detected. Hopefully we'll get some real ATS capable endpoints to
> test with soon.

That's still a big step forward from my software tests, thanks a lot for
the report. If you get around testing a real endpoint, there are a few
data points that would be really useful to compare, if only to see whether
enabling ATS is at all viable, or if we end up getting stuck in
queue_poll_cons in normal conditions:

* ATS enabled/disabled in endpoint
* ATSCHK enabled/disabled in SMMU
* Invalidation duration when ATC entry is present/absent, and the range is
big/small

Knowing this would indicate if more work is needed on invalidation sizing,
batching, postponing or if we can optimize later.

Thanks,
Jean



More information about the linux-arm-kernel mailing list