Kirkwood PCI(e) write performance and DMA engine support for copy_{to, from}_user?

Wolfgang Wegner ww-ml at gmx.de
Wed Sep 8 04:35:58 EDT 2010


On Tue, Sep 07, 2010 at 03:14:08PM -0400, Nicolas Pitre wrote:
> The STM instruction means store-multiple i.e. it takes a set of 
> registers and write them to memory in one go.  You could try using 
> memset() which should be optimized to use STM in that case:
[...]

Thank you for the explanation and code example!

Using write() (dd if=/dev/zero of=/dev/fb0 bs=1024 count=8192) the 
throughput was slightly lower in either case, but this may as well
be some other overhead (0.035s->0.052s for RAM, 1.299s->1.310s for
PCI frame buffer).

Using your assembler code, I get almost double throughput (0.035s->
0.018s, meaning around 466 MBytes/s) for RAM and a system lockup
for my PCI device. Hmm...

I will now set up some eval boards to see if I get an "off-the-shelf"
framebuffer with a stock PCI graphics card up and running for a
comparison.

Regards,
Wolfgang




More information about the linux-arm-kernel mailing list