Low network throughput on i.MX28

Jörg Krause joerg.krause at embedded.rocks
Sat Nov 5 05:06:50 PDT 2016


Hello Vinod,

as recommanded by Stefan Wahren I'm turning on you about this issue.
Please see below... 

On Sat, 2016-11-05 at 12:33 +0100, Stefan Wahren wrote:
> Hi Jörg,
> 
> > Jörg Krause <joerg.krause at embedded.rocks> hat am 4. November 2016
> > um 23:42
> > geschrieben:
> > 
> > 
> > Hi Stefan,
> > 
> > sorry, I forget the link in the previous mail.
> > 
> > On Fri, 2016-11-04 at 20:30 +0100, Stefan Wahren wrote:
> > > Hi Jörg,
> > > 
> > > > Jörg Krause <joerg.krause at embedded.rocks> hat am 4. November
> > > > 2016
> > > > um 19:44
> > > > geschrieben:
> > > > 
> > > > 
> > > > Hi Shawn,
> > > > 
> > > > On Wed, 2016-11-02 at 09:24 +0100, Stefan Wahren wrote:
> > > > > Am 02.11.2016 um 09:14 schrieb Jörg Krause:
> > > > > > On Sat, 2016-10-29 at 11:08 +0200, Stefan Wahren wrote:
> > > > > > > > Jörg Krause <joerg.krause at embedded.rocks> hat am 29.
> > > > > > > > Oktober
> > > > > > > > 2016
> > > > > > > > um 01:07
> > > > > > > > geschrieben:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > You mentioned [1] an optimization in the Freescale
> > > > > > > > vendor
> > > > > > > > Linux
> > > > > > > > kernel
> > > > > > > > [2]. I would really like to see this optimization in
> > > > > > > > the
> > > > > > > > mainline
> > > > > > > > kernel.
> > > > > > > > 
> > > > > > > > Did you ever tried to port this code from Freescale to
> > > > > > > > mainline?
> > > > > > > 
> > > > > > > Yes, i tried once but i was frustrated soon because of
> > > > > > > the
> > > > > > > lot of
> > > > > > > required
> > > > > > > changes and resulting issues.
> > > > > > 
> > > > > > I got the PIO mode working for the mxs-mmc driver. For this
> > > > > > I
> > > > > > ported
> > > > > > the PIO code from the vendor kernel and removed the usage
> > > > > > of
> > > > > > the
> > > > > > DMA
> > > > > > engine entirely.
> > > > > 
> > > > > Good job
> > > > > 
> > > > > > 
> > > > > > Testing network bandwidth with iperf, I get about
> > > > > > ~10Mbit/sec
> > > > > > with
> > > > > > PIO
> > > > > > mode compared to ~6.5Mbit/sec with DMA mode for UDP and
> > > > > > about
> > > > > > ~6.5Mbit/sec compared to ~4.5Mbit/sec with DMA mode for
> > > > > > TCP.
> > > > > 
> > > > > And how about MMC / sd card performance?
> > > > 
> > > > I noticed poor performance with the i.MX28 MMC and/or DMA
> > > > driver
> > > > using
> > > > the mainline kernel compared to the vendor Freescale kernel
> > > > 2.6.35.
> > > > I've seen that hou have added the drivers to mainline some
> > > > years
> > > > ago.
> > > > 
> > > > My custom i.MX28 board has a wifi chip attached to the SSP2
> > > > interface.
> > > > Comparing the bandwith with iperf I get >20Mbits/sec on the
> > > > vendor
> > > > kernel and <5Mbits/sec on the mainline kernel.
> > > 
> > > there is one thing about the clock handling. I noticed that the
> > > Vendor Kernel
> > > round up the clock frequency and the Mainline Kernel round down
> > > the
> > > clock
> > > frequency [1]. So don't trust the clock ratings from DT / board
> > > code.
> > > Better
> > > verify the register settings or check it with an osci.
> > > 
> > > [1] - http://www.spinics.net/lists/linux-mmc/msg09132.html
> > 
> > I checked the clock rate setting by reading the register 0x80014070
> > (HW_SSP2_TIMING). CLOCK_DIVIDE is 0x2 and CLOCK_RATE is 0x0. As SSP
> > CLK
> > is 96MHz this makes a clock rate of 48MHz.
> > 
> > There was a discussion on the mailing list [1] about that tasklets
> > might be slow.
> > 
> > [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2011-Febr
> > uary
> > /043395.html
> 
> if i unterstand it right the tasklet is not the problem, but the
> design of the
> MXS DMA driver. Please refer to the chapter "General Design Notes" to
> the
> documentation of the DMA provider [2].
> I think the MXS DMA driver is affected. Maybe you should ask Vinod
> Koul about
> this.
> 
> [2] - https://www.kernel.org/doc/Documentation/dmaengine/provider.txt

@ Vinod
In short, I noticed poor performance in the SSP2 (MMC/SD/SDIO)
interface on a custom i.MX28 board with a wifi chip attached. Comparing
the bandwith with iperf I get >20Mbits/sec on the vendor kernel and
<5Mbits/sec on the mainline kernel. I am trying to investigate what the
bottleneck is.

@ Stefan, all
My understanding is that the tasklet in this case is responsible for
reading the response registers of the DMA controller and return the
response to the MMC host driver.

The vendor kernel does this in the interrupt routine of mxs-mmc by
issueing a complete whereas the mainline kernel does this in the
interrupt routine in mxs-dma by scheduling the tasklet.

To check if this makes any difference I replaced the tasklet() usage
with using the complete() infrastructure. For this I hacked the DMA
engine and the MXS DMA driver. However, the performance stays the same.

So, if I understand correctly, this is not an issue here, right? So if
not the tasklet, what do you suspect?

Jörg



More information about the linux-arm-kernel mailing list