Kirkwood PCI(e) write performance and DMA engine support for copy_{to, from}_user?
Wolfgang Wegner
ww-ml at gmx.de
Thu Sep 9 12:21:35 EDT 2010
On Wed, Sep 08, 2010 at 10:35:58AM +0200, Wolfgang Wegner wrote:
>
> Using your assembler code, I get almost double throughput (0.035s->
> 0.018s, meaning around 466 MBytes/s) for RAM and a system lockup
> for my PCI device. Hmm...
>
> I will now set up some eval boards to see if I get an "off-the-shelf"
> framebuffer with a stock PCI graphics card up and running for a
> comparison.
The only memory-mapped PCI device I managed to get to run in the
PCIe->PCI bridge eval board was the FPGA evaluation board, together
with the manufacturer-supplied evaluation code. (The PCI
graphics cards were either too old (5V) or ATI-based, whose
driver seems to have been "improved" resulting in failure
without a BIOS. *sigh*)
With the FPGA evaluation board I get:
- around 38 MBytes/second with Nicolas' inline assembly code
- around 6 MBytes/second with any other C code (mmapped) as
well as write() via dd
Regardless of using ioremap_{wc,nocache,cached} and
pgprot_writecombine/pgprot_noncached.
So the main problem seems to be either our board implementation
of the PCIe->PCI bridge or the FPGA. However, I am still wondering
how a framebuffer-based application can attain reasonable performance,
as (to my understanding) in most of the cases using such an
throughput-optimized assembly code will not be possible.
On a side note: can anybody give a hint how to enable
ASYNC_CORE/ASYNC_MEMCPY? I see the options in crypto/async_tx/Kconfig
but can not find them via menuconfig? I would still like to try
using the DMA engine for transferring complete frames...
Regards,
Wolfgang
PS: another PCI device I tried via the PCIe->PCI bridge was
a Intel 82574L GBit NIC, which was able to reach >600MBit/s
throughput when tested with netio or netperf
More information about the linux-arm-kernel
mailing list