Kirkwood PCI(e) write performance and DMA engine support for copy_{to, from}_user?

Nicolas Pitre nico at fluxnic.net
Tue Sep 7 14:38:30 EDT 2010


On Tue, 7 Sep 2010, saeed bishara wrote:

> On Tue, Sep 7, 2010 at 10:58 AM, saeed bishara <saeed.bishara at gmail.com> wrote:
> > On Mon, Sep 6, 2010 at 5:14 PM, Wolfgang Wegner <ww-ml at gmx.de> wrote:
> >> On Mon, Sep 06, 2010 at 03:03:47PM +0100, Russell King - ARM Linux wrote:
> >>> On Mon, Sep 06, 2010 at 12:02:44PM +0200, Wolfgang Wegner wrote:
> >>> > Mapping the PCI memory space via mmap() resulted in some
> >>> > disappointing ~6.5 MBytes/second. I tried to modify page
> >>> > protection to pgprot_writecombine or pgprot_cached, but while
> >>> > this did reproducably change performance, it was only in
> >>> > some sub-percentage range. I am not sure if I understand
> >>> > correctly how other framebuffers handle this, but it seems
> >>> > the "raw" mmapped write performance is not cared about too
> >>> > much or maybe not that bad with most x86 chip sets?
> >>> > However, the idea left over after some trying and looking
> >>> > around is to use the DMA engine to speed up write() (and
> >>> > also read(), but this is not so important) system calls
> >>> > instead of using mmap.
> >>>
> >>> Framebuffer applications such as Xorg/Qt do not use read/write calls
> >>> to access their buffers because that will be painfully slow.
> >>
> >> BTW, the throughput I get with a "dd if=bitmap of=/dev/fb0 bs=512"
> >> is the same I get from my test application writing longwords
> >> sequentially to the mmapped frame buffer.
> > I'm not sure the writecombine is enabled properly, can you test that on DRAM?
> > you can do that be reserving some memory (mem=<dram size - 8M>), then
> > try to test throughput with and without writecombine.
> >
> also, in order to sent bursts, make sure that the stm instruction is
> used, preferred with 8 registers with address aligned to 8*4 bytes.

The copy_from_user code (through the write() system call) already takes 
care of this with large enough buffers.  Of course with mmap()'d memory 
you are responsible for optimizing this yourself.


Nicolas



More information about the linux-arm-kernel mailing list