[LEDE-DEV] A state of network acceleration / test on Archer C7 v4

Joel Wirāmu Pauling joel at aenertia.net
Sun Jan 28 15:58:01 PST 2018


FYI - the Openfast path patches are applied to several trees. I am
running them on a c7 v2 right now and am able to hit close to stock
numbers.

The NAT acceleration stuff isn't needed to with open-fastpath patches at all.

relevant thread:
https://forum.lede-project.org/t/qualcomm-fast-path-for-lede/4582

-Joel

On 29 January 2018 at 12:43, Florian Fainelli <f.fainelli at gmail.com> wrote:
> (please don't top post).
>
> On 01/28/2018 02:00 PM, Rosen Penev wrote:
>> Compared to the Archer C7v2, the v4 has a single ethernet interface
>> switched between all 5 ports. The v2 has two ethernet interfaces with
>> 4 ports being switched.
>>
>> Now the disappointing performance has several reasons to it. The main
>> one being that the ag71xx driver in OpenWrt is not very optimized for
>> the hardware.
>
> The driver certainly contributes to that, but I don't think it is the
> main reason behind it. Each time you send or receive a packet, you need
> to invalidate your data cache for at least 1500 bytes, or whatever the
> nomimal packet/buffer size has been allocated (e.g: 2KB), with very
> small I and D caches (typically 64KB) and no L2 cache, you do this
> trashing very frequently and you keep hitting the DRAM as well, this
> hurts performance a lot. This is just something the networking stack
> does, and it is really to diverge from this because that is inherently
> how it is designed, and how drivers are designed as well. This is why
> software bypass in hardware are so effective for low power CPUs.
>
> I would be curious to see the use of XDP redirect and implementing a
> software NAT fast path, that is, for the most basic NATP translation, do
> this in XDP as early as possible in the driver receive/transmit part and
> send directly to the outgoing interface, this should lower the pressure
> on the I and D caches by invalidating not the full packet length, but
> just the header portion. For more complex protocols, we would keep using
> the conntrack helpers to do the necessary operation (FTP, TFTP, SIP,
> etc..) on the packet. This might avoid doing a sk_buff allocation for
> each packet making it through, which is expensive.
>
>>
>> Qualcomm forked the driver (in 2013 i  think) and added some really
>> nice features. Some of these need to be backported for ag71xx in
>> OpenWrt to be competitive.
>
> Is it possible to just drop their driver in OpenWrt and get a feeling of
> the performance gap?
>
>>
>> It's going to take quite a bit of work to get the driver up to par.
>> Biggest performance boost I imagine would be to add GRO support. It
>> turns out for good routing performance, GRO requires hardware
>> checksumming, which is not supported by ag71xx in OpenWrt at the
>> moment.
>
> Does the hardware actually support checksum offloads?
>
>>
>> On Sun, Jan 28, 2018 at 1:14 PM, Joel Wirāmu Pauling <joel at aenertia.net> wrote:
>>> Hi as I also am using the archer c7's as my build targets (and c2600's) I
>>> am watching this keenly; is anyone else running openvswtich on these with
>>> the XDP patches?
>>>
>>> The c2600 which is arm a15 - currently really could do with optimization
>>> and probably is a much better choice for CPE. I would not be caught dead
>>> with the c7 as a 10Gbit CPE myself
>>> the SoC even with the Openfast path patches just can't handle complex QoS
>>> scheduling (i.e Cake/PIE) beyond a couple of hundred Mbit.
>>>
>>>
>>>
>>> -Joel
>>> ---
>>> https://www.youtube.com/watch?v=0xSA0ljsnjc&t=1
>>>
>>> On 29 January 2018 at 09:43, Laurent GUERBY <laurent at guerby.net> wrote:
>>>>
>>>> On Wed, 2018-01-17 at 19:30 +0100, Pablo Neira Ayuso wrote:
>>>>> Hi Rafal,
>>>>>
>>>>> On Wed, Jan 17, 2018 at 04:25:10PM +0100, Rafał Miłecki wrote:
>>>>>> Getting better network performance (mostly for NAT) using some kind
>>>>>> of
>>>>>> acceleration was always a hot topic and people are still
>>>>>> looking/asking for it. I'd like to write a short summary and share
>>>>>> my
>>>>>> understanding of current state so that:
>>>>>> 1) People can undesrtand it better
>>>>>> 2) We can have some rough plan
>>>>>>
>>>>>> First of all there are two possible ways of accelerating network
>>>>>> traffic: in software and in hardware. Software solution is
>>>>>> independent
>>>>>> of architecture/device and is mostly just bypassing in-kernel
>>>>>> packets
>>>>>> flow. It still uses device's CPU which can be a bottleneck. Various
>>>>>> software implementations are reported to be faster from 2x to 5x.
>>>>>
>>>>> This is what I've been observing for the software acceleration here,
>>>>> see slide 19 at:
>>>>>
>>>>> https://www.netdevconf.org/2.1/slides/apr8/ayuso-netdev-netfilter-upd
>>>>> ates-canada-2017.pdf
>>>>>
>>>>> The flowtable representation, in software, is providing a faster
>>>>> forwarding path between two nics. So it's basically an alternative to
>>>>> the classic forwarding path, that is faster. Packets kick in at the
>>>>> Netfilter ingress hook (right at the same location as 'tc' ingress),
>>>>> if there is a hit in the software flowtable, ttl gets decremented,
>>>>> NATs are done and the packet is placed in the destination NIC via
>>>>> neigh_xmit() - through the neighbour layer.
>>>>
>>>> Hi Pablo,
>>>>
>>>> I tested today a few things on a brand new TP-Link Archer C7 v4.0,
>>>> LAN client Dell Latitude 7480 (eth I219-LM, wifi 8265 / 8275)
>>>> WAN server NUC5i3RYB (eth I218-V), NAT between them, <1 ms latency
>>>> (everything on the same table), IPv4 unless specified,
>>>> using iperf3 LAN=>WAN and -R for WAN=>LAN (both TCP).
>>>>
>>>> With the TP-Link firmware:
>>>> - wired 930+ Mbit/s both ways
>>>> - wireless 5G 560+ Mbit/s down 440+ Mbit/s up
>>>> - wireless 2.4G 100+ Mbit/s both ways
>>>>
>>>> With OpenWRT/LEDE trunk 20180128 4.4 kernel:
>>>> - wired 350-400 Mbit/s both ways
>>>> - wired with firewall deactivated 550 Mbit/s
>>>>   (just "iptables -t nat -A POSTROUTING -j MASQUERADE")
>>>> - wired IPv6 routing, no NAT, no firewall 250 Mbit/s
>>>> - wireless 5G 150-200 Mbit/s
>>>> - wireless 2.4G forgot to test
>>>>
>>>> top on the router shows sirq at 90%+ during network load, other load
>>>> indicators are under 5%.
>>>>
>>>> IPv6 performance without NAT being below IPv4 with NAT seems
>>>> to indicate there are potential gains in software :).
>>>>
>>>> I didn't test OpenWRT in bridge mode but I got with LEDE 17.01
>>>> on an Archer C7 v2 about 550-600 Mbit/s iperf3 so I think
>>>> radio is good on these ath10k routers.
>>>>
>>>> So if OpenWRT can do about x2 in software routing performance we're
>>>> good against our TP-Link firmware friends :).
>>>>
>>>> tetaneutral.net (not-for-profit ISP, hosting OpenWRT and LEDE mirror in
>>>> FR) is going to install 40+ Archer C7 v4 running OpenWRT as CPE, each
>>>> with individual gigabit fiber uplink (TP-Link MC220L fiber converter),
>>>> and total 10G uplink (Dell/Force10 S4810 48x10G, yes some of our
>>>> members will get 10G on their PC at home :).
>>>>
>>>> We build our images from git source, generating imagebuilder and then a
>>>> custom python script. We have 5+ spare C7, fast build (20mn from
>>>> scratch) and testing environment, and of course we're interested in
>>>> suggestions on what to do.
>>>>
>>>> Thanks in advance for your help,
>>>>
>>>> Sincerely,
>>>>
>>>> Laurent
>>>> http://tetaneutral.net
>>>>
>>>>
>>>> _______________________________________________
>>>> Lede-dev mailing list
>>>> Lede-dev at lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/lede-dev
>>>
>>> _______________________________________________
>>> Lede-dev mailing list
>>> Lede-dev at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/lede-dev
>>
>> _______________________________________________
>> Lede-dev mailing list
>> Lede-dev at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/lede-dev
>>
>
> --
> Florian



More information about the Lede-dev mailing list