[PATCH 02/13] dmaengine: edma: Optimize memcpy operation

Peter Ujfalusi peter.ujfalusi at ti.com
Wed Oct 14 08:02:18 PDT 2015


On 10/14/2015 05:41 PM, Vinod Koul wrote:
> On Wed, Oct 14, 2015 at 04:12:13PM +0300, Peter Ujfalusi wrote:
>> @@ -1320,41 +1317,92 @@ static struct dma_async_tx_descriptor *edma_prep_dma_memcpy(
>>  	struct dma_chan *chan, dma_addr_t dest, dma_addr_t src,
>>  	size_t len, unsigned long tx_flags)
>>  {
>> -	int ret;
>> +	int ret, nslots;
>>  	struct edma_desc *edesc;
>>  	struct device *dev = chan->device->dev;
>>  	struct edma_chan *echan = to_edma_chan(chan);
>> -	unsigned int width;
>> +	unsigned int width, pset_len;
>>  
>>  	if (unlikely(!echan || !len))
>>  		return NULL;
>>  
>> -	edesc = kzalloc(sizeof(*edesc) + sizeof(edesc->pset[0]), GFP_ATOMIC);
>> +	if (len < SZ_64K) {
>> +		/*
>> +		 * Transfer size less than 64K can be handled with one paRAM
>> +		 * slot. ACNT = length
>> +		 */
>> +		width = len;
>> +		pset_len = len;
>> +		nslots = 1;
>> +	} else {
>> +		/*
>> +		 * Transfer size bigger than 64K will be handled with maximum of
>> +		 * two paRAM slots.
>> +		 * slot1: ACNT = 32767, length1: (length / 32767)
>> +		 * slot2: the remaining amount of data.
>> +		 */
>> +		width = SZ_32K - 1;
>> +		pset_len = rounddown(len, width);
>> +		/* One slot is enough for lengths multiple of (SZ_32K -1) */
> 
> Hmm so does this mean if I have 140K transfer, it will do two 64K for 1st
> slot and 12K in second slot ?

Not exactly. If the size is less than 64K it can be done with one 'burst' but
if it is bigger we need to have two sets of transfer:
1. 32K blocks
2. the remaining data

so in case of 140K:
4 x 32K followed by 12K

> 
> Is there a limit on 'blocks' of 64K we can do here?

32767 32K blocks is the limit.

The 64K burst is only possible if the whole transfer is less less than 64K.
With the ACNT counter we can transfer 64K - 1 bytes, but if this is not enough
we need to use the BCNT counter and for that to work the the distance between
the start of 'slot n' and the start of 'slot n+1' need to be less than 32K,
this is the reason why we have 32K 'blocks' to transfer first followed by the
remaining.

-- 
Péter



More information about the linux-arm-kernel mailing list