Unicast packets stop being transmitted to a particular station, under load, when WPA2 is enabled

Adrian Chadd adrian at freebsd.org
Mon May 12 07:49:38 PDT 2014


Hi,

I've faced this in FreeBSD. It's only very recently that I found some
corner case block-ack window tracking bugs that only occured during
periods of extreme packet loss. What I would do when this happens:

* I'd dump out the entire software queue for the hung station and
hardware queue state. the hardware queue state tends to be quite small
on ath9k/freebsd as we artifically limit the queue depth
* I also hacked up a rolling log of all the transmit, transmit
completion, baw add, baw remove log entries, so I could go back in
history to find where the hole came from.

The last one I found took a day of active torrenting and around 500
million log line entries just to trigger. :-)

As for firmware, lemme respond to that separately.



-a

On 11 May 2014 19:46, Dave Taht <dave.taht at gmail.com> wrote:
> On Sun, May 11, 2014 at 7:42 PM, Dave Taht <dave.taht at gmail.com> wrote:
>> On Sun, May 11, 2014 at 7:29 PM, Avery Pennarun <apenwarr at gmail.com> wrote:
>>> On Sun, May 11, 2014 at 10:07 PM, Dave Taht <dave.taht at gmail.com> wrote:
>>>> I have been failing to find and fix a very similar problem on the
>>>> ath9k for many months now. What I see happening there is that one or
>>>> more of the
>>>> hardware queues locks up, and stops transmitting traffic. So, for
>>>> example I might get traffic destined for the BK (background queue,
>>>> traffic marked CS1) hung,
>>>> but BE remains fine. Most recently I was able to lock up the VO, VI
>>>> AND BK queues by exercising it overnight with multiple copies of the
>>>> rrul test.
>>>>
>>>> I don't know much about how the hardware queues are configured on
>>>> ath10k, but you can land stuff in each queue by marking with CS0, CS1,
>>>> CS5, and CS7 (BE,BK,VI,VO) on mac80211 based devices.
>>>
>>> I think my problem may be something else.  In particular, it seems to
>>> affect each station separately, and doesn't seem to happen if I
>>> disable encryption.  (Does your ath9k problem trigger if encryption is
>>> turned off?)
>>
>> No. WPA2 only so far.
>>
>> I will try multiple stations to see if I can get it to occur only on a
>> per-station basis. (there are hardware queues for multiple forms of
>> traffic not just the visible VO, VI, BE, and BK queues)
>>
>>> I also have an ath9k device in the same AP on 2.4 GHz,
>>> and it doesn't trigger there either.  I haven't attempted to see if
>>> your bug triggers on that one though :)
>>
>> It really takes work to trigger it, and I can can now do it on both
>> 2.4ghz and 5. Getting it down to under 6 hours of high traffic
>> recently was an accomplishment.
>>
>>>> I can make it happen more often, faster, if the associated station has
>>>> considerable distance and less signal strength than nearby.
>>
>> There are not often executed code paths controlling how noise rejection
>> works, and all sorts of hardware issues on configuring it that vary between
>> chipset versions. Ton of patches had landed in head that had an update
>> to the ANI values
>> that worked on newer versions of the ath9k chipset that later had to be modified
>> to deal with older ath9k chipsets.
>>
>>> I just checked, and my bug seems to trigger more often when I'm at a
>>> longer distance (my macbook says about -60 RSSI) and less often at a
>>> closer distance (currently macbook reports RSSI of -41).  Not sure if
>>> this is related to increased retransmits or decreased speed or
>>> something else.
>>>
>>>> Blow it up with netperf-wrappers -H someserver rrul...
>>>
>>> That's not a bad idea... I really need to get netperf-wrappers going
>>> for some stress testing :)
>>
>> The hardware queues are rarely tested.
>>
>> If you just want to blow up one queue at a time, the syntax for netperf is
>
> netperf -H someserver -t the_test -Y CS1,CS1 # or CS5,CS5 or CS6, CS6
>
> I have been flooding all the queues with both -t TCP_STREAM and TCP_MAERTS
> to make it happen using the rrul test, but I have also made it happen
> with BE only.
>
> Getting one data point every day or so makes for slow debugging.
>
>>
>> You can also arbitrarily do tos-setting with iptables.
>>
>> dnsmasq uses CS6 by default, btw, so it's DHCP packets land by default
>> in VO and then get shuffled over to the multicast hw queue.
>>
>>
>>>
>>> Have fun,
>>>
>>> Avery
>>
>>
>>
>> --
>> Dave Täht
>>
>> NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article
>
>
>
> --
> Dave Täht
>
> NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article
>
> _______________________________________________
> ath10k mailing list
> ath10k at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k



More information about the ath10k mailing list