[LEDE-DEV] [PATCH] ag71xx: Add some unlikely calls + rearange some stuff in hard_start_xmit.
Rosen Penev
rosenp at gmail.com
Fri Feb 16 14:44:34 PST 2018
On Thu, Feb 15, 2018 at 3:29 AM, Felix Fietkau <nbd at nbd.name> wrote:
> On 2018-02-14 16:20, Rosen Penev wrote:
>> On Tue, Feb 13, 2018 at 11:10 PM, Felix Fietkau <nbd at nbd.name> wrote:
>>> On 2018-02-13 23:53, Rosen Penev wrote:
>>>> Based on Qualcomm driver. Improves iperf3 throughput by ~20mbps on transmit on Archer C7v4.
>>>>
>>>> Signed-off-by: Rosen Penev <rosenp at gmail.com>
>>>> ---
>>>> .../drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c | 14 +++++++-------
>>>> 1 file changed, 7 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c
>>>> index 95682b7641..d32f220178 100644
>>>> --- a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c
>>>> +++ b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c
>>>> @@ -797,11 +797,14 @@ static netdev_tx_t ag71xx_hard_start_xmit(struct sk_buff *skb,
>>>> if (ag71xx_has_ar8216(ag))
>>>> ag71xx_add_ar8216_header(ag, skb);
>>>>
>>>> - if (skb->len <= 4) {
>>>> + dma_cache_sync (NULL, skb->data, skb->len, DMA_TO_DEVICE);
>>> The use of dma_cache_sync here makes no sense, since it's the wrong API.
>>> Also, effectively it results in the same kind of cache flush as the one
>>> that's done by the DMA mapping done later.
>>
>> From my reading of dma_cache_flush, I agree. However that's the part
>> of this patch that gives the biggest speedup. Before this patch, I
>> tested just adding that and it worked. I can back this up with
>> benchmarks later on.
> In my test I quite often encountered big differences in throughput from
> minor (and often very much unrelated) changes. I haven't found the real
> source of these differences yet, so it's hard to know which changes
> really help in the long run. My best guess is that some changes affect
> the alignment of critical functions and thus affect the icache footprint
> somehow.
>
> Until I see a reasonable explanation of how this change helps, I'm going
> to assume it's the same kind of random change fluctuation that I've seen
> so often and NACK this change.
So I just benchmarked this while not connected to anything. Basically,
WAN port assigned to LAN interface (it's the same CPU port - eth0) and
br-lan removed so that eth0.1 is assigned to LAN.
I noticed something bizarre. With the patch applied, I was getting
~150mbps on iperf3 consistently. With the patch not applied, I was
getting the same speed, EXCEPT the speed would sometimes bump up to
~250mbps.
Adding dma_cache_sync eliminated this boost. I have no explanation for
this. Maybe TX queue gets full? Sample result:
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.1, port 40080
[ 5] local 192.168.1.2 port 5201 connected to 192.168.1.1 port 40082
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 17.0 MBytes 143 Mbits/sec
[ 5] 1.00-2.00 sec 18.1 MBytes 152 Mbits/sec
[ 5] 2.00-3.00 sec 17.5 MBytes 147 Mbits/sec
[ 5] 3.00-4.00 sec 17.6 MBytes 148 Mbits/sec
[ 5] 4.00-5.00 sec 17.8 MBytes 149 Mbits/sec
[ 5] 5.00-6.00 sec 18.0 MBytes 151 Mbits/sec
[ 5] 6.00-7.00 sec 25.5 MBytes 214 Mbits/sec
[ 5] 7.00-8.00 sec 30.9 MBytes 259 Mbits/sec
[ 5] 8.00-9.00 sec 30.4 MBytes 255 Mbits/sec
[ 5] 9.00-10.00 sec 31.1 MBytes 261 Mbits/sec
[ 5] 10.00-10.06 sec 1.77 MBytes 250 Mbits/sec
>
> - Felix
More information about the Lede-dev
mailing list