[LEDE-DEV] [PATCH] ag71xx: Add some unlikely calls + rearange some stuff in hard_start_xmit.

Rosen Penev rosenp at gmail.com
Fri Feb 16 14:44:34 PST 2018


On Thu, Feb 15, 2018 at 3:29 AM, Felix Fietkau <nbd at nbd.name> wrote:
> On 2018-02-14 16:20, Rosen Penev wrote:
>> On Tue, Feb 13, 2018 at 11:10 PM, Felix Fietkau <nbd at nbd.name> wrote:
>>> On 2018-02-13 23:53, Rosen Penev wrote:
>>>> Based on Qualcomm driver. Improves iperf3 throughput by ~20mbps on transmit on Archer C7v4.
>>>>
>>>> Signed-off-by: Rosen Penev <rosenp at gmail.com>
>>>> ---
>>>>  .../drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c      | 14 +++++++-------
>>>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c
>>>> index 95682b7641..d32f220178 100644
>>>> --- a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c
>>>> +++ b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c
>>>> @@ -797,11 +797,14 @@ static netdev_tx_t ag71xx_hard_start_xmit(struct sk_buff *skb,
>>>>       if (ag71xx_has_ar8216(ag))
>>>>               ag71xx_add_ar8216_header(ag, skb);
>>>>
>>>> -     if (skb->len <= 4) {
>>>> +     dma_cache_sync (NULL, skb->data, skb->len, DMA_TO_DEVICE);
>>> The use of dma_cache_sync here makes no sense, since it's the wrong API.
>>> Also, effectively it results in the same kind of cache flush as the one
>>> that's done by the DMA mapping done later.
>>
>> From my reading of dma_cache_flush, I agree. However that's the part
>> of this patch that gives the biggest speedup. Before this patch, I
>> tested just adding that and it worked. I can back this up with
>> benchmarks later on.
> In my test I quite often encountered big differences in throughput from
> minor (and often very much unrelated) changes. I haven't found the real
> source of these differences yet, so it's hard to know which changes
> really help in the long run. My best guess is that some changes affect
> the alignment of critical functions and thus affect the icache footprint
> somehow.
>
> Until I see a reasonable explanation of how this change helps, I'm going
> to assume it's the same kind of random change fluctuation that I've seen
> so often and NACK this change.

So I just benchmarked this while not connected to anything. Basically,
WAN port assigned to LAN interface (it's the same CPU port - eth0) and
br-lan removed so that eth0.1 is assigned to LAN.

I noticed something bizarre. With the patch applied, I was getting
~150mbps on iperf3 consistently. With the patch not applied, I was
getting the same speed, EXCEPT the speed would sometimes bump up to
~250mbps.

Adding dma_cache_sync eliminated this boost. I have no explanation for
this. Maybe TX queue gets full? Sample result:

Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.1, port 40080
[  5] local 192.168.1.2 port 5201 connected to 192.168.1.1 port 40082
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  17.0 MBytes   143 Mbits/sec
[  5]   1.00-2.00   sec  18.1 MBytes   152 Mbits/sec
[  5]   2.00-3.00   sec  17.5 MBytes   147 Mbits/sec
[  5]   3.00-4.00   sec  17.6 MBytes   148 Mbits/sec
[  5]   4.00-5.00   sec  17.8 MBytes   149 Mbits/sec
[  5]   5.00-6.00   sec  18.0 MBytes   151 Mbits/sec
[  5]   6.00-7.00   sec  25.5 MBytes   214 Mbits/sec
[  5]   7.00-8.00   sec  30.9 MBytes   259 Mbits/sec
[  5]   8.00-9.00   sec  30.4 MBytes   255 Mbits/sec
[  5]   9.00-10.00  sec  31.1 MBytes   261 Mbits/sec
[  5]  10.00-10.06  sec  1.77 MBytes   250 Mbits/sec

>
> - Felix



More information about the Lede-dev mailing list