help troubleshooting low throughput

Fri May 23 00:42:51 PDT 2014

On 22 May 2014 20:37, Tim Harvey <tharvey at gateworks.com> wrote:
> On Thu, May 22, 2014 at 3:08 AM, Michal Kazior <michal.kazior at tieto.com> wrote:
>> On 22 May 2014 11:46, Tim Harvey <tharvey at gateworks.com> wrote:
>>> On Thu, May 22, 2014 at 2:39 AM, Tim Harvey <tharvey at gateworks.com> wrote:
>>>> Greetings,
>>>>
>>>> I could use some help troubleshooting a low throughput issue. I'm
>>>> currently using the following:
>>>>  - UNEX DAXA-O1 11ac/n/a 3x3 MIMO qca988x hw2.0
>>>> http://www.unex.com.tw/product/daxa-o1
>>>>  - 80MHz channel w/o local interference
>>>>  - ath10k git 0dbbb028a7c461777bf4a0d53780e539e6f40e14 (May 16)
>>>>  - up-to-date git of hostapd/wpa_supplicant/iw
>>>>  - fw 10.1.467.2-1 api 2 htt 2.1
>>>>  - infrastructure mode using
>>>> http://wireless.kernel.org/en/users/Drivers/ath10k/configuration#Full_hostapd_configuration
>>
>> Did you just copy&paste the example config file (and updated
>> interface=) or did you do something extra?
>
> Hi Michal,
>
> I disabled bridge mode, DFS, wpa, wps and added
> 'vht_oper_centr_freq_seg0_idx=42' which appears to be something new
> that is required or hostapd bails out:

VHT requires a generic center frequency (or rather channel number in
hostapd) to be provided. Since you have channel=36 then the center
frequency for 80MHz bandwidth is 36+6 = 42.

> ### hostapd configuration file
[..]

I'd try simply:

> ht_capab=[HT40+][SHORT-GI-20][SHORT-GI-40]
> vht_capab=[MAX-MPDU-11454][SHORT-GI-80][MAX-A-MPDU-LEN-EXP7]

Otherwise it looks fine to me.

[...]

> Its showing 80Mhz MCS 5 (between 5 and 8)
>
> root at sta-97:~# iw wlan0 station dump
> Station 60:02:b4:9d:99:7f (on wlan0)
>         inactive time:  590 ms
>         rx bytes:       160004
>         rx packets:     1824
>         tx bytes:       9832
>         tx packets:     87
>         tx retries:     0
>         tx failed:      0
>         signal:         -53 dBm
>         signal avg:     -52 dBm
>         tx bitrate:     6.0 MBit/s
>         rx bitrate:     975.0 MBit/s VHT-MCS 7 80MHz short GI VHT-NSS 3
>         authorized:     yes
>         authenticated:  yes
>         preamble:       long
>         WMM/WME:        yes
>         MFP:            no
>         TDLS peer:      no
>
> ap is showing 80MHz width between MCS 5 and MCS 8:
>
> root at ap-99:~# iw wlan0 station dump
> Station 60:02:b4:9d:99:62 (on wlan0)
>         inactive time:  0 ms
>         rx bytes:       275591916
>         rx packets:     182178
>         tx bytes:       4394890
>         tx packets:     50807
>         tx retries:     0
>         tx failed:      0
>         signal:         -60 dBm
>         signal avg:     -60 dBm
>         tx bitrate:     6.0 MBit/s
>         rx bitrate:     702.0 MBit/s VHT-MCS 5 80MHz VHT-NSS 3
>         authorized:     yes
>         authenticated:  yes
>         preamble:       long
>         WMM/WME:        yes
>         MFP:            no
>         TDLS peer:      no

This looks good. So rate control is doing rather fine. 3 spatial
streams (VHT-NSS 3) are at work.

>>>> I'm using iperf for throughput tests and getting no more than 220mbps
>>>> best case, typically more like 120mbps. The rx bitrate bounces around
>>>> MCS 5 to 8 and shows 3 spatial streams so I would be expecting a much
>>>> higher throughput. The cards are in boards with a quad-core ARM 1GHz
>>>> Cortex-A9 CPU and there is no indication the system is bottle-necked.
>>>> There are no other kernel modules loaded other thank
>>>> ath10k_pci/ath10k_core/ath and debugging is disabled.
>>
>> Currently ath10k doesn't really scale much with number of CPUs. There
>> are basically two tasklets that could split the work just a little
>> bit, but this requires interrupt spreading. From what I know some ARM
>> chips can't do that so ath10k ends up using only single CPU all the
>> time. 1GHz of an A9 should still be enough to get you 500mbps+ though.
>
> interesting. I see [ath10k_wq] in ps, what is the other task? ath10k
> will just register 1 interrupt for PCI, how would you spread that if
> only 1 ath10k device is in the system?

ath10k_wq is the workqueue. It is not related to tasklets at all.

Even if you have a single interrupt your controller may spread
interrupts across sockets/cores/threads. So one time device issues an
interrupt CPU0 gets interrupted and another time CPU1 gets
interrupted.

> I would agree that a 1GHz CortexA9 should be able to do well. The top
> application shows that only CPU0 is being utilized and never more than
> 25% or so (softirq mostly) and mostly idle. So I don't think this is
> any sort of CPU bottleneck.

The 25% sound fishy considering you have quad-core CPU. I'm not really
sure if top (or top you use for that matter) reports percent wrt to a
single CPU or globally. I would certainly investigate this. I recall
vmstat sums everything up, i.e. if it says "25% sys" then it means
"25% of your entrie CPU set is doing sys, regardless which core it
is".

>> Did you run TCP and/or UDP tests? What direction did you test
>> (station->ap / ap->station)?
>
> both - the best throughput I see is appx 220mbps TCP and 260mbps UDP
> and this is consistent in both directions.

Did you try using the -P switch to send parallel streams? E.g. -u -b
100M -P5 for UDP?

Also, now that I think about you don't have a bridge. This means your
AP system has to perform a lot more packet mangling which I guess can
be pretty taxing for the A9.

>>>> Using tcpdump/wireshark to inspect the radiotap headers I only see
>>>> packets with 'Antenna: 0' - Does this field indicate what transmitter
>>>> the pkt was received on and indicate I'm only receiving from a single
>>>> transmitter?
>>
>> Antenna callbacks aren't implemented in master branch yet. You can
>> check ath-next-test for that or cherry picks Ben's patches.
>
> ok - I will do that. Does this field indicate which antenna the frame
> was sent out or received from?

I'm not familiar with this stuff, but I guess so.

>> You can verify max number of spatial streams with iw by looking at `HT
>> TX/RX MCS rate indexes supported:` and/or VHT TX/RX MCS sets.
>>
>
> These look fine (see below) but is there a way to actually prove (ie
> by radiotap) that frames come in/out of multiple antennas?
>
[...]
>                 VHT RX MCS set:
>                         1 streams: MCS 0-9
>                         2 streams: MCS 0-9
>                         3 streams: MCS 0-9
>                         4 streams: not supported
>                         5 streams: not supported
>                         6 streams: not supported
>                         7 streams: not supported
>                         8 streams: not supported
>                 VHT RX highest supported: 0 Mbps
>                 VHT TX MCS set:
>                         1 streams: MCS 0-9
>                         2 streams: MCS 0-9
>                         3 streams: MCS 0-9
>                         4 streams: not supported
>                         5 streams: not supported
>                         6 streams: not supported
>                         7 streams: not supported
>                         8 streams: not supported
[...]

Device capabilities look fine (iw station dump you posted above
already verified we're good though).

>>>> I've also noticed that 'iw wlan0 station dump' statistics show in the
>>>> rx bitrate stat that sometimes SGI is flagged and sometimes its not -
>>>> should I expect this to change like this? I was under the impression
>>>> that was a static configuration.
>>
>> I think this ath10k's hw rate control is free to pick SGI/LGI
>> (assuming target station supports SGI at all).
>
> I always assumed a user would want to force one way or another - does
> the rate control engine try to optimize by using short guard-band
> interval (if supported) and then back-off to long if it detects lots
> of retries?

Beats me.

At best you can force a single tx bitrate mcs and force it to LGI/SGI
(firmware interface limitation). But that's it I guess.

>>> I neglected to mention that on whichever end is running the iperf
>>> server (receiving) I do see periodic 'failed to pop...' warnings, a
>>> pair of them a couple times a minute. I'm not quite clear if this is a
>>> hardware issue or something else:
>>>
>>> [ 3763.131280] ath10k: failed to pop amsdu from htt rx ring -22
>>> [ 3769.081093] ath10k: failed to pop chained msdus, dropping
>>> [ 3769.086575] ath10k: failed to pop amsdu from htt rx ring -22
>>> [ 3781.092383] ath10k: failed to pop chained msdus, dropping
>>> [ 3781.097869] ath10k: failed to pop amsdu from htt rx ring -22
>>> [ 3784.367225] ath10k: failed to pop chained msdus, dropping
>>> [ 3784.372735] ath10k: failed to pop amsdu from htt rx ring -22
>>> [ 3809.501484] ath10k: failed to pop chained msdus, dropping
>>> [ 3809.506993] ath10k: failed to pop amsdu from htt rx ring -22
>>> [ 3821.723404] ath10k: failed to pop chained msdus, dropping
>>> [ 3821.728914] ath10k: failed to pop amsdu from htt rx ring -22
>>> [ 3832.067136] ath10k: failed to pop chained msdus, dropping
>>> [ 3832.072690] ath10k: failed to pop amsdu from htt rx ring -22
>>> [ 3833.632859] ath10k: failed to pop chained msdus, dropping
>>> [ 3833.638341] ath10k: failed to pop amsdu from htt rx ring -22
>>> [ 3837.940414] ath10k: failed to pop chained msdus, dropping
>>> [ 3837.945982] ath10k: failed to pop amsdu from htt rx ring -22
>>> [ 3844.100514] ath10k: failed to pop chained msdus, dropping
>>
>> You don't have to worry about these messages.
>>
>> This was recently introduced by one of my patches to further analyze a
>> bug that Avery was seeing (kernel panic due to skb_push). I plan on
>> making a patch to clean this up.
>>
>> Anyway, thanks for letting know :-)
>
> ok - I saw the patch but from the commit message I through it was
> indicating these were failures. Thanks for the explanation.

Yeah. I originally thought these shouldn't pop up until Really Bad
Things [tm] happen but I got proved wrong.

Michał