Need to get msdu-chaining working.

Michal Kazior michal.kazior at tieto.com
Wed Mar 5 03:09:14 EST 2014


On 4 March 2014 17:59, Ben Greear <greearb at candelatech.com> wrote:
> On 03/03/2014 11:36 PM, Michal Kazior wrote:
>> On 3 March 2014 23:13, Ben Greear <greearb at candelatech.com> wrote:
>>> On 02/27/2014 11:36 PM, Michal Kazior wrote:
>>>
>>>>>> rx buffer and is split across the popped amsdu list. I suspect only
>>>>>> the first msdu in chain has the htt_rx_desc and all other have not
>>>>>> (this is what the current code does, but you'll need to verify that).
>>>>>>
>>>>>> I would try to concatenate all msdus into one (lots of memcpy :( ) or
>>>>>> increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
>>>>>> single buffer (hopefully FW/HW is capable of doing that).
>>>
>>> Just FYI:  At least on my firmware in raw rx mode, increasing the
>>> HTT_RX_BUF_SIZE (to 4 * 1920) and at least some chaining remains.
>>> Performance did not change noticeably.  I'm using fairly powerful
>>> core i7 processor systems, so maybe the memcpy doesn't
>>> make enough difference to notice in my tests.
>>>
>>> I did not put any effort into figuring out why.
>>
>> Getting rid of memcpy() was a huge performance win for AP135 and its
>> MIPS processor.
>
> No doubt, but at this point, my problems appear to lie elsewhere.
>
>>> I'm currently getting about 540Mbps upload TCP goodput,
>>> and only 420Mbps download TCP goodput.  Not sure why
>>> the discrepancy, but perhaps the rx raw performance
>>> is worse for a variety of reasons.  My firmware changes
>>> to support multiple stations to same AP may also be slowing
>>> things down, though these numbers are from  a single station
>>> test...
>>
>> Hmm, I assume you test this without any bridging. It's probably going
>> to be a little slower due to tx timings being directly visible to the
>> TCP subsystem because both TCP and ath10k are locally on the same
>> machine. You could try moving the actual TCP endpoints behind bridges.
>>
>> Or you're actually seeing the memcpy() at work...
>>
>> Did you try to test performance on vanilla driver/firmware?
>
> I have used vanilla firmware on AP for all tests, because my firmware
> will not do AP mode on WLE900VX for some reason.  Using my slightly modified
> driver has no noticeable difference (and it now works virtually identical
> to upstream code when not using my modified firmware).
>
> For station machine, vanilla firmware performs no better than my firmware,
> and I see the same issue where upload is 150Mbps or so faster than
> download.
>
> I tried putting TCP/UDP endpoints on AP, and using AP as bridge, and both
> cases have similar throughput.  Interestingly to me, UDP and TCP have similar
> thoughput, so it is unlikely that we are actually hitting limits on
> the spectrum (otherwise, UDP would do better because it has no ACK packets
> and generally runs much faster total throughput on wifi in my experience
> with /a/b/g/n NICs).
>
> With vanilla firmware, there should be little to no amsdu packets
> (I assume), so it is unlikely to be related to memcpy.  perf top

Incorrect. There's actually quite a lot of amsdu with vanilla firmware
(keep in mind I refer to nwifi rx). In the early days ath10k was
stiching msdus back just to be teared down again in mac80211 and this
was hitting performance.


> shows no obvious hot spots in download test (running about 380Mbps
> in this case):
>
>   2.24%  [kernel]                      [k] swiotlb_tbl_unmap_single
>   1.88%  [kernel]                      [k] do_raw_spin_lock
>   1.88%  [kernel]                      [k] ioread32
>   1.84%  [kernel]                      [k] tcp_packet
>   1.53%  [mac80211]                    [k] ieee80211_rx_handlers
>   1.31%  [kernel]                      [k] copy_user_generic_string
>   1.19%  [ath10k_core]                 [k] ath10k_htt_rx_amsdu.isra.29
>   1.14%  btserver                      [.] do_big_while()
>   1.09%  [kernel]                      [k] _raw_spin_lock_irqsave
>
> What throughputs are you seeing, and what NICs are you using for AP
> and stations?

With current master branch CUS223-CUS223 should get over 800mbps in
udp tx/rx and 700mbps in tcp tx/rx with cabled setup on a poor AP135.
Since AP135's CPU doesn't have any time to idle around I expect x86 to
perform better and I don't think OTA should be terribly slower in a
reasonably clean environment.


Michał



More information about the ath10k mailing list