vc04_services: performance tests of dma_map_sg/dma_map_page/dmac_map_area
Michael Zoran
mzoran at crowfest.net
Sun Oct 23 20:55:22 PDT 2016
Hi,
I implemented an alternative implementation of vchiq that uses
dma_map_sg instead of dma_map_page. I've included some benchmarks.
As you can see that for larger page sizes, the dma_map_sg
implementation is faster then the original unportable dma_map_area
implementation. So it's really a question of how important is
portability and getting this driver checked into upstream vs.
performance with small message sizes.
Test dmac_map_area dma_map_page dma_map_sg
vchiq_test -b 4 10000 51us/iter 76us/iter 76us
vchiq_test -b 8 10000 70us/iter 82us/iter 91us
vchiq_test -b 16 10000 94us/iter 118us/iter 121us
vchiq_test -b 32 10000 146us/iter 173us/iter 187us
vchiq_test -b 64 10000 263us/iter 328us/iter 299us
vchiq_test -b 128 10000 529us/iter 631us/iter 595us
vchiq_test -b 256 10000 2285us/iter 2275us/iter 2001us
vchiq_test -b 512 10000 4372us/iter 4616us/iter 4123us
So for message sizes >= 64KB, dma_map_sg is faster then dma_map_page.
For message size >= 256KB, the dma_map_sg is the fastest
implementation.
If anybody is interested, I can submit a patch for dma_map_sg which
would add portability tonight or possibly tomorrow sometime.
Yet another alternative would be to keep dmac_map_area for arm(32) and
use either dma_map_page or dma_map_sg for arm64.
Thanks.
More information about the linux-rpi-kernel
mailing list