[PATCH 0/8] io-pgtable lock removal
Ray Jui
ray.jui at broadcom.com
Thu Jul 6 11:14:00 PDT 2017
Hi Will,
On 7/6/17 8:08 AM, Will Deacon wrote:
> Hi Ray,
>
> Thanks for testing this, and sorry it didn't help.
>
> On Wed, Jul 05, 2017 at 04:24:22PM -0700, Ray Jui wrote:
>> On 7/5/17 1:41 AM, Will Deacon wrote:
>>> On Tue, Jul 04, 2017 at 06:45:17PM -0700, Ray Jui wrote:
>>>> Has anything functionally changed between PATCH v2 and v1? I'm seeing a
>>>> very different L2 throughput with v2 (in general a lot worse with v2 vs.
>>>> v1); however, I'm currently unable to reproduce the TLB sync timed out
>>>> issue with v2 (without the patch from Will's email).
>>>>
>>>> It could also be something else that has changed in my setup, but so far
>>>> I have not yet been able to spot anything wrong in the setup.
>>>
>>> There were fixes, and that initially involved a DSB that was found to be
>>> expensive. The patches queued in -next should have that addressed, so please
>>> use those (or my for-joerg/arm-smmu/updates branch).
>>>
>>
>> That was my bad yesterday. I was in a rush and the setup was incorrect.
>>
>> I redo my Ethernet performance test with both PATCH v1 and v2 today, and
>> can confirm the performance is consistent between v1 and v2 as expected.
>>
>> I also made sure the following message can still be reproduced with
>> patch set v2:
>>
>> arm-smmu 64000000.mmu: TLB sync timed out -- SMMU may be deadlocked
>>
>> Then I proceeded to apply your patch that attempt to fix the deadlock
>> issue.
>
> [...]
>
>> However, with the fix patch, I can still see the deadlock message when I
>> have > 32 iperf TX threads active in the system:
>
> Damn. We've been going over this today and the only plausible theory seems
> to be that concurrent TLB syncs are causing the completion to be pushed out,
> resulting in timeouts.
>
> Can you try this patch below, instead of the one I sent before, please?
>
> Thanks,
>
> Will
>
> --->8
Good news! I can confirm that with the new patch below, the error log of
"TLB sync timed out" is now gone. I ran 20 iterations of the iperf
client test with 64 threads spread across 8 CPUs. The error message used
to coming out very quickly with just one iteration. But now it's
completely gone.
At the same time, I do seem to observe a slight impact on performance
when multiple threads are used, but I guess that is expected, given that
a spin lock is added to protect the TLB sync.
Great work and thanks for the fix!
Ray
>
> From bbf3737c29e3d18f539998a66f42878ac91cde97 Mon Sep 17 00:00:00 2001
> From: Will Deacon <will.deacon at arm.com>
> Date: Thu, 6 Jul 2017 15:55:48 +0100
> Subject: [PATCH] iommu/arm-smmu: Reintroduce locking around TLB sync
> operations
>
> Commit 523d7423e21b ("iommu/arm-smmu: Remove io-pgtable spinlock")
> removed the locking used to serialise map/unmap calls into the io-pgtable
> code from the ARM SMMU driver. This is good for performance, but opens
> us up to a nasty race with TLB syncs because the TLB sync register is
> shared within a context bank (or even globally for stage-2 on SMMUv1).
>
> There are two cases to consider:
>
> 1. A CPU can be spinning on the completion of a TLB sync, take an
> interrupt which issues a subsequent TLB sync, and then report a
> timeout on return from the interrupt.
>
> 2. A CPU can be spinning on the completion of a TLB sync, but other
> CPUs can continuously issue additional TLB syncs in such a way that
> the backoff logic reports a timeout.
>
> Rather than fix this by spinning for completion of prior TLB syncs before
> issuing a new one (which may suffer from fairness issues on large systems),
> instead reintroduce locking around TLB sync operations in the ARM SMMU
> driver.
>
> Fixes: 523d7423e21b ("iommu/arm-smmu: Remove io-pgtable spinlock")
> Cc: Robin Murphy <robin.murphy at arm.com>
> Reported-by: Ray Jui <ray.jui at broadcom.com>
> Signed-off-by: Will Deacon <will.deacon at arm.com>
> ---
> drivers/iommu/arm-smmu.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index b446183b3015..770abd247f40 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -400,6 +400,8 @@ struct arm_smmu_device {
>
> u32 cavium_id_base; /* Specific to Cavium */
>
> + spinlock_t global_sync_lock;
> +
> /* IOMMU core code handle */
> struct iommu_device iommu;
> };
> @@ -436,7 +438,7 @@ struct arm_smmu_domain {
> struct arm_smmu_cfg cfg;
> enum arm_smmu_domain_stage stage;
> struct mutex init_mutex; /* Protects smmu pointer */
> - spinlock_t cb_lock; /* Serialises ATS1* ops */
> + spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */
> struct iommu_domain domain;
> };
>
> @@ -602,9 +604,12 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
> static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu)
> {
> void __iomem *base = ARM_SMMU_GR0(smmu);
> + unsigned long flags;
>
> + spin_lock_irqsave(&smmu->global_sync_lock, flags);
> __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_GR0_sTLBGSYNC,
> base + ARM_SMMU_GR0_sTLBGSTATUS);
> + spin_unlock_irqrestore(&smmu->global_sync_lock, flags);
> }
>
> static void arm_smmu_tlb_sync_context(void *cookie)
> @@ -612,9 +617,12 @@ static void arm_smmu_tlb_sync_context(void *cookie)
> struct arm_smmu_domain *smmu_domain = cookie;
> struct arm_smmu_device *smmu = smmu_domain->smmu;
> void __iomem *base = ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx);
> + unsigned long flags;
>
> + spin_lock_irqsave(&smmu_domain->cb_lock, flags);
> __arm_smmu_tlb_sync(smmu, base + ARM_SMMU_CB_TLBSYNC,
> base + ARM_SMMU_CB_TLBSTATUS);
> + spin_unlock_irqrestore(&smmu_domain->cb_lock, flags);
> }
>
> static void arm_smmu_tlb_sync_vmid(void *cookie)
> @@ -1925,6 +1933,7 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
>
> smmu->num_mapping_groups = size;
> mutex_init(&smmu->stream_map_mutex);
> + spin_lock_init(&smmu->global_sync_lock);
>
> if (smmu->version < ARM_SMMU_V2 || !(id & ID0_PTFS_NO_AARCH32)) {
> smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_L;
>
More information about the linux-arm-kernel
mailing list