Kirkwood PCI(e) write performance and DMA engine support for copy_{to, from}_user?

Leon Woestenberg leon.woestenberg at gmail.com
Wed Sep 15 19:39:54 EDT 2010


Hello Wolfgang,

On Tue, Sep 14, 2010 at 9:03 AM, Wolfgang Wegner <ww-ml at gmx.de> wrote:
> On Mon, Sep 13, 2010 at 07:10:59PM +0200, Leon Woestenberg wrote:
>> Hello Wolfgang,
>>
>> On Thu, Sep 9, 2010 at 6:21 PM, Wolfgang Wegner <ww-ml at gmx.de> wrote:
>> > On Wed, Sep 08, 2010 at 10:35:58AM +0200, Wolfgang Wegner wrote:
>> >>
>> > With the FPGA evaluation board I get:
>> > - around 38 MBytes/second with Nicolas' inline assembly code
>> > - around 6 MBytes/second with any other C code (mmapped) as
>> >  well as write() via dd
>> >
>> > So the main problem seems to be either our board implementation
>> > of the PCIe->PCI bridge or the FPGA. However, I am still wondering
>> > how a framebuffer-based application can attain reasonable performance,
>> >
>> Having implemented a framebuffer demo on an FPGA recently using PCI
>> Express, I think the main performance gain is made by having the DMA
>> done by the endpoint (FPGA) rather than by the CPU.
>
> this is what I read all around, however, I do not see how this
> can improve anything when using an mmap()ed frame buffer for
> pixel-oriented operations...
>
I haven't seen a SoC yet that can reach the bandwidth of PCIe using
its DMA controller to push data out the PCIe bus. Most of these do not
support the same type of large read requests that an endpoint may
perform, typically 512 bytes or even 4096 per read request over PCI
Express. Couple that with an efficient endpoint  SGDMA controller and
you reach PCIe full bandwidth, no cycles wasted.

mmap() backed by a good SoC DMA controller may come close, but the max
payload is usually less (128 bytes).

Also, most SoC DMA controllers are too limited to set up the kind of
DMA you want.

> This is why I thought about reverting to write() and simply transfer
> complete frames, which would be sufficient for about 90% of my
> application scenarios - and for the other 10% I could live with
> the lower performance.
>
Anything that works. How much burden do you want to put on the CPU though?

Regards,
-- 
Leon



More information about the linux-arm-kernel mailing list