[PATCH 0/8] io-pgtable lock removal

John Garry john.garry at huawei.com
Thu Jun 15 05:25:31 PDT 2017


On 15/06/2017 01:40, Ray Jui via iommu wrote:

Hi Robin,

wangzhou tested this patchset on our SMMUv3-based development board with 
a 10G PCI NIC card.

Currently we see a ~17% performance (throughput) drop when enabling the 
SMMU, but only a ~8% drop with your patchset.

FYI, for our integrated storage and network adapter, we see a big 
performance hit (maybe 40%) when enabling the SMMU with or without the 
patchset. Leizhen has been doing some investigation on this.

Thanks,
John

> Hi Robin,
>
> I have applied this patch series on top of v4.12-rc4, and ran various
> Ethernet and NVMf target throughput tests on it.
>
> To give you some background of my setup:
>
> The system is a ARMv8 based system with 8 cores. It has various PCIe
> root complexes that can be used to connect to PCIe endpoint devices
> including NIC cards and NVMe SSDs.
>
> I'm particularly interested in the performance of the PCIe root complex
> that connects to the NIC card, and during my test, IOMMU is
> enabled/disabled against that particular PCIe root complex. The root
> complexes connected to NVMe SSDs remain unchanged (i.e., without IOMMU).
>
> For the Ethernet throughput out of 50G link:
>
> Note during the multiple TCP session test, each session will be spread
> to different CPU cores for optimized performance
>
> Without IOMMU:
>
> TX TCP x1 - 29.7 Gbps
> TX TCP x4 - 30.5 Gbps
> TX TCP x8 - 28 Gbps
>
> RX TCP x1 - 15 Gbps
> RX TCP x4 - 33.7 Gbps
> RX TCP x8 - 36 Gbps
>
> With IOMMU, but without your latest patch:
>
> TX TCP x1 - 15.2 Gbps
> TX TCP x4 - 14.3 Gbps
> TX TCP x8 - 13 Gbps
>
> RX TCP x1 - 7.88 Gbps
> RX TCP x4 - 13.2 Gbps
> RX TCP x8 - 12.6 Gbps
>
> With IOMMU and your latest patch:
>
> TX TCP x1 - 21.4 Gbps
> TX TCP x4 - 30.5 Gbps
> TX TCP x8 - 21.3 Gbps
>
> RX TCP x1 - 7.7 Gbps
> RX TCP x4 - 20.1 Gbps
> RX TCP x8 - 27.1 Gbps
>
> With the NVMf target test with 4 SSDs, fio based test, random read, 4k,
> 8 jobs:
>
> Without IOMMU:
>
> IOPS = 1080K
>
> With IOMMU, but without your latest patch:
>
> IOPS = 520K
>
> With IOMMU and your latest patch:
>
> IOPS = 500K ~ 850K (a lot of variation observed during the same test run)
>
> As you can see, performance has improved significantly with this patch
> series! That is very impressive!
>
> However, it is still off, compared to the test runs without the IOMMU.
> I'm wondering if more improvement is expected.
>
> In addition, a much larger throughput variation is observed in the tests
> with these latest patches, when multiple CPUs are involved. I'm
> wondering if that is caused by some remaining lock in the driver?
>
> Also, in a few occasions, I observed the following message during the
> test, when multiple cores are involved:
>
> arm-smmu 64000000.mmu: TLB sync timed out -- SMMU may be deadlocked
>
> Thanks,
>
> Ray
>
> On 6/9/17 12:28 PM, Nate Watterson wrote:
>> Hi Robin,
>>
>> On 6/8/2017 7:51 AM, Robin Murphy wrote:
>>> Hi all,
>>>
>>> Here's the cleaned up nominally-final version of the patches everybody's
>>> keen to see. #1 is just a non-critical thing-I-spotted-in-passing fix,
>>> #2-#4 do some preparatory work (and bid farewell to everyone's least
>>> favourite bit of code, hooray!), and #5-#8 do the dirty deed itself.
>>>
>>> The branch I've previously shared has been updated too:
>>>
>>>    git://linux-arm.org/linux-rm  iommu/pgtable
>>>
>>> All feedback welcome, as I'd really like to land this for 4.13.
>>>
>>
>> I tested the series on a QDF2400 development platform and see notable
>> performance improvements particularly in workloads that make concurrent
>> accesses to a single iommu_domain.
>>
>>> Robin.
>>>
>>>
>>> Robin Murphy (8):
>>>    iommu/io-pgtable-arm-v7s: Check table PTEs more precisely
>>>    iommu/io-pgtable-arm: Improve split_blk_unmap
>>>    iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
>>>    iommu/io-pgtable: Introduce explicit coherency
>>>    iommu/io-pgtable-arm: Support lockless operation
>>>    iommu/io-pgtable-arm-v7s: Support lockless operation
>>>    iommu/arm-smmu: Remove io-pgtable spinlock
>>>    iommu/arm-smmu-v3: Remove io-pgtable spinlock
>>>
>>>   drivers/iommu/arm-smmu-v3.c        |  36 ++-----
>>>   drivers/iommu/arm-smmu.c           |  48 ++++------
>>>   drivers/iommu/io-pgtable-arm-v7s.c | 173
>>> +++++++++++++++++++++------------
>>>   drivers/iommu/io-pgtable-arm.c     | 190
>>> ++++++++++++++++++++++++-------------
>>>   drivers/iommu/io-pgtable.h         |   6 ++
>>>   5 files changed, 268 insertions(+), 185 deletions(-)
>>>
>>
> _______________________________________________
> iommu mailing list
> iommu at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
> .
>





More information about the linux-arm-kernel mailing list