[PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction

Tue Aug 22 18:21:33 PDT 2017

On 2017/8/22 23:41, Joerg Roedel wrote:
> On Mon, Jun 26, 2017 at 09:38:46PM +0800, Zhen Lei wrote:
>> -static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent)
>> +static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent, int optimize)
>>  {
>>  	if (queue_full(q))
>>  		return -ENOSPC;
>>  
>>  	queue_write(Q_ENT(q, q->prod), ent, q->ent_dwords);
>> -	queue_inc_prod(q);
>> +
>> +	/*
>> +	 * We don't want too many commands to be delayed, this may lead the
>> +	 * followed sync command to wait for a long time.
>> +	 */
>> +	if (optimize && (++q->nr_delay < CMDQ_MAX_DELAYED)) {
>> +		queue_inc_swprod(q);
>> +	} else {
>> +		queue_inc_prod(q);
>> +		q->nr_delay = 0;
>> +	}
>> +
>>  	return 0;
>>  }
>>  
>> @@ -909,6 +928,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>>  static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>>  				    struct arm_smmu_cmdq_ent *ent)
>>  {
>> +	int optimize = 0;
>>  	u64 cmd[CMDQ_ENT_DWORDS];
>>  	unsigned long flags;
>>  	bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
>> @@ -920,8 +940,17 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>>  		return;
>>  	}
>>  
>> +	/*
>> +	 * All TLBI commands should be followed by a sync command later.
>> +	 * The CFGI commands is the same, but they are rarely executed.
>> +	 * So just optimize TLBI commands now, to reduce the "if" judgement.
>> +	 */
>> +	if ((ent->opcode >= CMDQ_OP_TLBI_NH_ALL) &&
>> +	    (ent->opcode <= CMDQ_OP_TLBI_NSNH_ALL))
>> +		optimize = 1;
>> +
>>  	spin_lock_irqsave(&smmu->cmdq.lock, flags);
>> -	while (queue_insert_raw(q, cmd) == -ENOSPC) {
>> +	while (queue_insert_raw(q, cmd, optimize) == -ENOSPC) {
>>  		if (queue_poll_cons(q, false, wfe))
>>  			dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
>>  	}
> 
> This doesn't look correct. How do you make sure that a given IOVA range
> is flushed before the addresses are reused?
Hi, Joerg:
	It's actullay guaranteed by the upper layer functions, for example:
	static int arm_lpae_unmap(
        ...
    	unmapped = __arm_lpae_unmap(data, iova, size, lvl, ptep);	//__arm_lpae_unmap will indirectly call arm_smmu_cmdq_issue_cmd to invalidate tlbs
	if (unmapped)
		io_pgtable_tlb_sync(&data->iop);			//a tlb_sync wait all tlbi operations finished

	I also described it in the next patch(2/5). Showed below:

Some people might ask: Is it safe to do so? The answer is yes. The standard
processing flow is:
	alloc iova
	map
	process data
	unmap
	tlb invalidation and sync
	free iova

What should be guaranteed is: "free iova" action is behind "unmap" and "tlbi
operation" action, that is what we are doing right now. This ensures that:
all TLBs of an iova-range have been invalidated before the iova reallocated.

Best regards,
	LeiZhen

> 
> 
> Regards,
> 
> 	Joerg
> 
> 
> .
> 

-- 
Thanks!
BestRegards