OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)

Dave Taht dave.taht at gmail.com
Fri May 6 12:43:03 PDT 2016


On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin <leroi.lists at gmail.com> wrote:
> On 6 May 2016 at 21:43, Roman Yeryomin <leroi.lists at gmail.com> wrote:
>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer <brouer at redhat.com> wrote:
>>>
>>> I've created a OpenWRT ticket[1] on this issue, as it seems that someone[2]
>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>> is in some kind of conflict.
>>>
>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>
>>> [2] http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>
>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>
> Forgot to mention, I've reduced drop_batch_size down to 32

0) Not clear to me if that's the right line, there are 4 wifi queues,
and the third one
is the BE queue. That is too low a limit, also, for normal use. And:
for the purpose of this particular UDP test, flows 16 is ok, but not
ideal.

1) What's the tcp number (with a simultaneous ping) with this latest patchset?
(I care about tcp performance a lot more than udp floods - surviving a
udp flood yes, performance, no)

before/after?

tc -s qdisc show dev wlan0 during/after results?

IF you are doing builds for the archer c7v2, I can join in on this... (?)

I did do a test of the ath10k "before", fq_codel *never engaged*, and
tcp induced latencies under load, e at 100mbit, cracked 600ms, while
staying flat (20ms) at 100mbit. (not the same patches you are testing)
on x86. I have got tcp 300Mbit out of an osx box, similar latency,
have yet to get anything more on anything I currently have
before/after patchsets.

I'll go add flooding to the tests, I just finished a series comparing
two different speed stations and life was good on that.

"before" - fq_codel never engages, we see seconds of latency under load.

root at apu2:~# tc -s qdisc show dev wlp4s0
qdisc mq 0: root
 Sent 8570563893 bytes 6326983 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 2262 bytes 17 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 220486569 bytes 152058 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 18168 drop_overlimit 0 new_flow_count 1 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 8340546509 bytes 6163431 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 68130 drop_overlimit 0 new_flow_count 120050 ecn_mark 0
  new_flows_len 1 old_flows_len 3
qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514
target 5.0ms interval 100.0ms ecn
 Sent 9528553 bytes 11477 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
  new_flows_len 1 old_flows_len 0
  ```


>> This is certainly better than 30Mbps but still more than two times
>> less than before (900).

The number that I still am not sure we got is that you were sending
900mbit udp and recieving 900mbit on the prior tests?

>> TCP also improved a little (550 to ~590).

The limit is probably a bit low, also.  You might want to try target
20ms as well.

>>
>> Felix, others, do you want to see the ported patch, maybe I did something wrong?
>> Doesn't look like it will save ath10k from performance regression.

what was tcp "before"? (I'm sorry, such a long thread)

>>
>>>
>>> On Fri, 6 May 2016 11:42:43 +0200
>>> Jesper Dangaard Brouer <brouer at redhat.com> wrote:
>>>
>>>> Hi Felix,
>>>>
>>>> This is an important fix for OpenWRT, please read!
>>>>
>>>> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
>>>> without also adjusting q->flows_cnt.  Eric explains below that you must
>>>> also adjust the buckets (q->flows_cnt) for this not to break. (Just
>>>> adjust it to 128)
>>>>
>>>> Problematic OpenWRT commit in question:
>>>>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>>>>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from causing too much cpu load with higher speed (#21326)")
>>>>
>>>>
>>>> I also highly recommend you cherry-pick this very recent commit:
>>>>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>>>>  https://git.kernel.org/davem/net-next/c/9d18562a227
>>>>
>>>> This should fix very high CPU usage in-case fq_codel goes into drop mode.
>>>> The problem is that drop mode was considered rare, and implementation
>>>> wise it was chosen to be more expensive (to save cycles on normal mode).
>>>> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
>>>> especially expensive for smaller devices, as it scans a 4K big array,
>>>> thus 64 cache misses for small devices!
>>>>
>>>> The fix is to allow drop-mode to bulk-drop more packets when entering
>>>> drop-mode (default 64 bulk drop).  That way we don't suddenly
>>>> experience a significantly higher processing cost per packet, but
>>>> instead can amortize this.
>>>>
>>>> To Eric, should we recommend OpenWRT to adjust default (max) 64 bulk
>>>> drop, given we also recommend bucket size to be 128 ? (thus the amount
>>>> of memory to scan is less, but their CPU is also much smaller).
>>>>
>>>> --Jesper
>>>>
>>>>
>>>> On Thu, 05 May 2016 12:23:27 -0700 Eric Dumazet <eric.dumazet at gmail.com> wrote:
>>>>
>>>> > On Thu, 2016-05-05 at 19:25 +0300, Roman Yeryomin wrote:
>>>> > > On 5 May 2016 at 19:12, Eric Dumazet <eric.dumazet at gmail.com> wrote:
>>>> > > > On Thu, 2016-05-05 at 17:53 +0300, Roman Yeryomin wrote:
>>>> > > >
>>>> > > >>
>>>> > > >> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 1024p flows 1024
>>>> > > >> quantum 1514 target 5.0ms interval 100.0ms ecn
>>>> > > >>  Sent 12306 bytes 128 pkt (dropped 0, overlimits 0 requeues 0)
>>>> > > >>  backlog 0b 0p requeues 0
>>>> > > >>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>> > > >>   new_flows_len 0 old_flows_len 0
>>>> > > >
>>>> > > >
>>>> > > > Limit of 1024 packets and 1024 flows is not wise I think.
>>>> > > >
>>>> > > > (If all buckets are in use, each bucket has a virtual queue of 1 packet,
>>>> > > > which is almost the same than having no queue at all)
>>>> > > >
>>>> > > > I suggest to have at least 8 packets per bucket, to let Codel have a
>>>> > > > chance to trigger.
>>>> > > >
>>>> > > > So you could either reduce number of buckets to 128 (if memory is
>>>> > > > tight), or increase limit to 8192.
>>>> > >
>>>> > > Will try, but what I've posted is default, I didn't change/configure that.
>>>> >
>>>> > fq_codel has a default of 10240 packets and 1024 buckets.
>>>> >
>>>> > http://lxr.free-electrons.com/source/net/sched/sch_fq_codel.c#L413
>>>> >
>>>> > If someone changed that in the linux variant you use, he probably should
>>>> > explain the rationale.
>>>
>>> --
>>> Best regards,
>>>   Jesper Dangaard Brouer
>>>   MSc.CS, Principal Kernel Engineer at Red Hat
>>>   Author of http://www.iptv-analyzer.org
>>>   LinkedIn: http://www.linkedin.com/in/brouer



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org



More information about the ath10k mailing list