[LEDE-DEV] A state of network acceleration / test on Archer C7 v4

Florian Fainelli f.fainelli at gmail.com
Sun Jan 28 15:43:43 PST 2018


(please don't top post).

On 01/28/2018 02:00 PM, Rosen Penev wrote:
> Compared to the Archer C7v2, the v4 has a single ethernet interface
> switched between all 5 ports. The v2 has two ethernet interfaces with
> 4 ports being switched.
> 
> Now the disappointing performance has several reasons to it. The main
> one being that the ag71xx driver in OpenWrt is not very optimized for
> the hardware.

The driver certainly contributes to that, but I don't think it is the
main reason behind it. Each time you send or receive a packet, you need
to invalidate your data cache for at least 1500 bytes, or whatever the
nomimal packet/buffer size has been allocated (e.g: 2KB), with very
small I and D caches (typically 64KB) and no L2 cache, you do this
trashing very frequently and you keep hitting the DRAM as well, this
hurts performance a lot. This is just something the networking stack
does, and it is really to diverge from this because that is inherently
how it is designed, and how drivers are designed as well. This is why
software bypass in hardware are so effective for low power CPUs.

I would be curious to see the use of XDP redirect and implementing a
software NAT fast path, that is, for the most basic NATP translation, do
this in XDP as early as possible in the driver receive/transmit part and
send directly to the outgoing interface, this should lower the pressure
on the I and D caches by invalidating not the full packet length, but
just the header portion. For more complex protocols, we would keep using
the conntrack helpers to do the necessary operation (FTP, TFTP, SIP,
etc..) on the packet. This might avoid doing a sk_buff allocation for
each packet making it through, which is expensive.

> 
> Qualcomm forked the driver (in 2013 i  think) and added some really
> nice features. Some of these need to be backported for ag71xx in
> OpenWrt to be competitive.

Is it possible to just drop their driver in OpenWrt and get a feeling of
the performance gap?

> 
> It's going to take quite a bit of work to get the driver up to par.
> Biggest performance boost I imagine would be to add GRO support. It
> turns out for good routing performance, GRO requires hardware
> checksumming, which is not supported by ag71xx in OpenWrt at the
> moment.

Does the hardware actually support checksum offloads?

> 
> On Sun, Jan 28, 2018 at 1:14 PM, Joel Wirāmu Pauling <joel at aenertia.net> wrote:
>> Hi as I also am using the archer c7's as my build targets (and c2600's) I
>> am watching this keenly; is anyone else running openvswtich on these with
>> the XDP patches?
>>
>> The c2600 which is arm a15 - currently really could do with optimization
>> and probably is a much better choice for CPE. I would not be caught dead
>> with the c7 as a 10Gbit CPE myself
>> the SoC even with the Openfast path patches just can't handle complex QoS
>> scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit.
>>
>>
>>
>> -Joel
>> ---
>> https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1
>>
>> On 29 January 2018 at 09:43, Laurent GUERBY <laurent at guerby.net> wrote:
>>>
>>> On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote:
>>>> Hi Rafal,
>>>>
>>>> On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote:
>>>>> Getting better network performance (mostly for NAT) using some kind
>>>>> of
>>>>> acceleration was always a hot topic and people are still
>>>>> looking/asking for it. I'd like to write a short summary and share
>>>>> my
>>>>> understanding of current state so that:
>>>>> 1) People can undesrtand it better
>>>>> 2) We can have some rough plan
>>>>>
>>>>> First of all there are two possible ways of accelerating network
>>>>> traffic: in software and in hardware. Software solution is
>>>>> independent
>>>>> of architecture/device and is mostly just bypassing in-kernel
>>>>> packets
>>>>> flow. It still uses device's CPU which can be a bottleneck. Various
>>>>> software implementations are reported to be faster from 2x to 5x.
>>>>
>>>> This is what I've been observing for the software acceleration here,
>>>> see slide 19 at:
>>>>
>>>> https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd
>>>> ates-canada-2017.pdf
>>>>
>>>> The flowtable representation, in software, is providing a faster
>>>> forwarding path between two nics. So it's basically an alternative to
>>>> the classic forwarding path, that is faster. Packets kick in at the
>>>> Netfilter ingress hook (right at the same location as 'tc' ingress),
>>>> if there is a hit in the software flowtable, ttl gets decremented,
>>>> NATs are done and the packet is placed in the destination NIC via
>>>> neigh_xmit() - through the neighbour layer.
>>>
>>> Hi Pablo,
>>>
>>> I tested today a few things on a brand new TP-Link Archer C7 v4.0,
>>> LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275)
>>> WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency
>>> (everything on the same table), IPv4 unless specified,
>>> using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP).
>>>
>>> With the TP-Link firmware:
>>> - wired 930+ Mbit/s both ways
>>> - wireless 5G 560+ Mbit/s down 440+ Mbit/s up
>>> - wireless 2.4G 100+ Mbit/s both ways
>>>
>>> With OpenWRT/LEDE trunk 20180128 4.4 kernel:
>>> - wired 350-400 Mbit/s both ways
>>> - wired with firewall deactivated 550 Mbit/s
>>>   (just "iptables -t nat -A POSTROUTING -j MASQUERADE")
>>> - wired IPv6 routing, no NAT, no firewall 250 Mbit/s
>>> - wireless 5G 150-200 Mbit/s
>>> - wireless 2.4G forgot to test
>>>
>>> top on the router shows sirq at 90%+ during network load, other load
>>> indicators are under 5%.
>>>
>>> IPv6 performance without NAT being below IPv4 with NAT seems
>>> to indicate there are potential gains in software :).
>>>
>>> I didn't test OpenWRT in bridge mode but I got with LEDE 17.01
>>> on an Archer C7 v2 about 550-600 Mbit/s iperf3 so I think
>>> radio is good on these ath10k routers.
>>>
>>> So if OpenWRT can do about x2 in software routing performance we're
>>> good against our TP-Link firmware friends :).
>>>
>>> tetaneutral.net (not-for-profit ISP, hosting OpenWRT and LEDE mirror in
>>> FR) is going to install 40+ Archer C7 v4 running OpenWRT as CPE, each
>>> with individual gigabit fiber uplink (TP-Link MC220L fiber converter),
>>> and total 10G uplink (Dell/Force10 S4810 48x10G, yes some of our
>>> members will get 10G on their PC at home :).
>>>
>>> We build our images from git source, generating imagebuilder and then a
>>> custom python script. We have 5+ spare C7, fast build (20mn from
>>> scratch) and testing environment, and of course we're interested in
>>> suggestions on what to do.
>>>
>>> Thanks in advance for your help,
>>>
>>> Sincerely,
>>>
>>> Laurent
>>> http://tetaneutral.net
>>>
>>>
>>> _______________________________________________
>>> Lede-dev mailing list
>>> Lede-dev at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/lede-dev
>>
>> _______________________________________________
>> Lede-dev mailing list
>> Lede-dev at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/lede-dev
> 
> _______________________________________________
> Lede-dev mailing list
> Lede-dev at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev
> 

-- 
Florian



More information about the Lede-dev mailing list