[PATCH 0/3] IOVA allocation improvements for iommu-dma

Thu Mar 16 06:18:57 PDT 2017

On Wed, Mar 15, 2017 at 7:03 PM, Robin Murphy <robin.murphy at arm.com> wrote:
> Hi all,
>
> Here's the first bit of lock contention removal to chew on - feedback
> welcome! Note that for the current users of the io-pgtable framework,
> this is most likely to simply push more contention onto the io-pgtable
> lock, so may not show a great improvement alone. Will and I both have
> rough proof-of-concept implementations of lock-free io-pgtable code
> which we need to sit down and agree on at some point, hopefullt fairly
> soon.

Thanks for working on this.
As you said, it's indeed pushing lock contention down to pgtable lock from
iova rbtree lock but now morethan lock I see issue is with yielding CPU while
waiting for tlb_sync. Below are some numbers.

I have tweaked '__arm_smmu_tlb_sync' in SMMUv2 driver i.e basically removed
cpu_relax() and udelay() to make it a busy loop.

Before: 1.1 Gbps
With your patches: 1.45Gbps
With your patches + busy loop in tlb_sync: 7Gbps

If we reduce pgtable contention a bit
With your patches + busy loop in tlb_sync + Iperf threads reduced to 8
from 16: ~9Gbps

So looks like along with pgtable lock, some optimization can be done
to tlb_sync code as well.

Thanks,
Sunil.