[BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s

Eric Dumazet eric.dumazet at gmail.com
Thu Nov 21 17:00:01 EST 2013


On Thu, 2013-11-21 at 22:52 +0100, Willy Tarreau wrote:
> On Thu, Nov 21, 2013 at 10:51:09PM +0100, Arnaud Ebalard wrote:
> > Hi,
> > 
> > Willy Tarreau <w at 1wt.eu> writes:
> > 
> > > OK it paid off. And very well :-)
> > >
> > > I did it at once and it worked immediately. I generally don't like this
> > > because I always fear that some bug was left there hidden in the code. I have
> > > only tested it on the Mirabox, so I'll have to try on the OpenBlocks AX3-4 and
> > > on the XP-GP board for some SMP stress tests.
> > >
> > > I upgraded my Mirabox to latest Linus' git (commit 5527d151) and compared
> > > with and without the patch.
> > >
> > >   without :
> > >       - need at least 12 streams to reach gigabit.
> > >       - 60% of idle CPU remains at 1 Gbps
> > >       - HTTP connection rate on empty objects is 9950 connections/s
> > >       - cumulated outgoing traffic on two ports reaches 1.3 Gbps
> > >
> > >   with the patch :
> > >       - a single stream easily saturates the gigabit
> > >       - 87% of idle CPU at 1 Gbps (12 streams, 90% idle at 1 stream)
> > >       - HTTP connection rate on empty objects is 10250 connections/s
> > >       - I saturate the two gig ports at 99% CPU, so 2 Gbps sustained output.
> > >
> > > BTW I must say I was impressed to see that big an improvement in CPU
> > > usage between 3.10 and 3.13, I suspect some of the Tx queue improvements
> > > that Eric has done in between account for this.
> > >
> > > I cut the patch in 3 parts :
> > >    - one which reintroduces the hidden bits of the driver
> > >    - one which replaces the timer with the IRQ
> > >    - one which changes the default Tx coalesce from 16 to 4 packets
> > >      (larger was preferred with the timer, but less is better now).
> > >
> > > I'm attaching them, please test them on your device.
> > 
> > Well, on the RN102 (Armada 370), I get the same results as with your
> > previous patch, i.e. netperf and nginx saturate the link. Apache still
> > lagging behind though.
> > 
> > > Note that this is *not* for inclusion at the moment as it has not been
> > > tested on the SMP CPUs.
> > 
> > I tested it on my RN2120 (2-core armada XP): I got no problem and the
> > link saturated w/ apache, nginx and netperf. Good work!
> 
> Great, thanks for your tests Arnaud. I forgot to mention that all my
> tests this evening involved this patch as well.

Now you might try to set a lower value
for /proc/sys/net/ipv4/tcp_limit_output_bytes

Ideally, a value of 8192 (instead of 131072) allows
to queue less data per tcp flow, and react faster to losses,
as retransmits don't have to wait that previous packets in Qdisc left
the host.

131072 bytes for a 80 Mbit flow means more than 11 ms of queueing :(






More information about the linux-arm-kernel mailing list