vc04_services: performance tests of dma_map_sg/dma_map_page/dmac_map_area

Sun Oct 23 20:55:22 PDT 2016

Hi,

I implemented an alternative implementation of vchiq that uses
dma_map_sg instead of dma_map_page.  I've included some benchmarks.

As you can see that for larger page sizes, the dma_map_sg
implementation is faster then the original unportable dma_map_area
implementation.  So it's really a question of how important is
portability and getting this driver checked into upstream vs.
performance with small message sizes.

Test                       dmac_map_area   dma_map_page dma_map_sg
vchiq_test -b 4 10000      51us/iter       76us/iter    76us
vchiq_test -b 8 10000      70us/iter       82us/iter    91us
vchiq_test -b 16 10000     94us/iter       118us/iter   121us
vchiq_test -b 32 10000     146us/iter      173us/iter   187us
vchiq_test -b 64 10000     263us/iter      328us/iter   299us
vchiq_test -b 128 10000    529us/iter      631us/iter   595us
vchiq_test -b 256 10000    2285us/iter     2275us/iter  2001us
vchiq_test -b 512 10000    4372us/iter     4616us/iter  4123us

So for message sizes >= 64KB, dma_map_sg is faster then dma_map_page.

For message size >= 256KB, the dma_map_sg is the fastest
implementation.

If anybody is interested, I can submit a patch for dma_map_sg which
would add portability tonight or possibly tomorrow sometime.

Yet another alternative would be to keep dmac_map_area for arm(32) and
use either dma_map_page or dma_map_sg for arm64.

Thanks.