non barrier versions of dma_map functions

Wed Dec 9 19:32:19 EST 2009

Russell King - ARM Linux wrote:
> On Mon, Dec 07, 2009 at 11:37:21AM -0800, adharmap at codeaurora.org wrote:
>> We have a situation where we need to dma map multiple cached buffers for a
>> single dma transaction.
>>
>> The current DMA api suggests the use of dma_map_single for cache
>> consistency. On ARMv7 it performs the necessary cache-operations and calls
>> data sync barrier instruction (DSB). In our case we would be executing
>> multiple DSB instruction before starting the dma operation - we need
>> memory to be consistent only after we map the last buffer.
> 
> Is it a problem and do you have numbers to illustrate why it is a
> problem, or is this just theory?

Here are numbers from a test ran on ARMv7 based device
It kmallocs N buffers of size 'size', dirties their cache by writing
to them and calls dma_map_single that calls the arch specific clean
operations with and without dsb. In "without dsb" case a dsb is executed
after the last buffer is mapped. The time is in microseconds

size	N	map_single	map_single w/o dsb	delta
128	16	8		5			60%
512	16	9		6			50%
512	32	15		8			88%
512	48	20		11			82%
512	64	27		14			93%
64	4	4		3			33%
64	8	4		3			33%
64	16	7		4			75%
64	32	12		4			200%
64	48	17		6			183%
64	64	21		7			200%
1024	16	9		7			29%

These buffer sizes and N are very close to real world sizes the
framebuffer driver handles. Cases where N is large happen the most
often.

Clearly,we could benefit from the nobarrier versions of the cache
operations and we could use them in scatter gather mappings as well.

Abhijeet