Need to get msdu-chaining working.
Michal Kazior
michal.kazior at tieto.com
Wed Mar 5 03:09:14 EST 2014
On 4 March 2014 17:59, Ben Greear <greearb at candelatech.com> wrote:
> On 03/03/2014 11:36 PM, Michal Kazior wrote:
>> On 3 March 2014 23:13, Ben Greear <greearb at candelatech.com> wrote:
>>> On 02/27/2014 11:36 PM, Michal Kazior wrote:
>>>
>>>>>> rx buffer and is split across the popped amsdu list. I suspect only
>>>>>> the first msdu in chain has the htt_rx_desc and all other have not
>>>>>> (this is what the current code does, but you'll need to verify that).
>>>>>>
>>>>>> I would try to concatenate all msdus into one (lots of memcpy :( ) or
>>>>>> increase the HTT_RX_BUF_SIZE so that A-MSDU frames can fit into a
>>>>>> single buffer (hopefully FW/HW is capable of doing that).
>>>
>>> Just FYI: At least on my firmware in raw rx mode, increasing the
>>> HTT_RX_BUF_SIZE (to 4 * 1920) and at least some chaining remains.
>>> Performance did not change noticeably. I'm using fairly powerful
>>> core i7 processor systems, so maybe the memcpy doesn't
>>> make enough difference to notice in my tests.
>>>
>>> I did not put any effort into figuring out why.
>>
>> Getting rid of memcpy() was a huge performance win for AP135 and its
>> MIPS processor.
>
> No doubt, but at this point, my problems appear to lie elsewhere.
>
>>> I'm currently getting about 540Mbps upload TCP goodput,
>>> and only 420Mbps download TCP goodput. Not sure why
>>> the discrepancy, but perhaps the rx raw performance
>>> is worse for a variety of reasons. My firmware changes
>>> to support multiple stations to same AP may also be slowing
>>> things down, though these numbers are from a single station
>>> test...
>>
>> Hmm, I assume you test this without any bridging. It's probably going
>> to be a little slower due to tx timings being directly visible to the
>> TCP subsystem because both TCP and ath10k are locally on the same
>> machine. You could try moving the actual TCP endpoints behind bridges.
>>
>> Or you're actually seeing the memcpy() at work...
>>
>> Did you try to test performance on vanilla driver/firmware?
>
> I have used vanilla firmware on AP for all tests, because my firmware
> will not do AP mode on WLE900VX for some reason. Using my slightly modified
> driver has no noticeable difference (and it now works virtually identical
> to upstream code when not using my modified firmware).
>
> For station machine, vanilla firmware performs no better than my firmware,
> and I see the same issue where upload is 150Mbps or so faster than
> download.
>
> I tried putting TCP/UDP endpoints on AP, and using AP as bridge, and both
> cases have similar throughput. Interestingly to me, UDP and TCP have similar
> thoughput, so it is unlikely that we are actually hitting limits on
> the spectrum (otherwise, UDP would do better because it has no ACK packets
> and generally runs much faster total throughput on wifi in my experience
> with /a/b/g/n NICs).
>
> With vanilla firmware, there should be little to no amsdu packets
> (I assume), so it is unlikely to be related to memcpy. perf top
Incorrect. There's actually quite a lot of amsdu with vanilla firmware
(keep in mind I refer to nwifi rx). In the early days ath10k was
stiching msdus back just to be teared down again in mac80211 and this
was hitting performance.
> shows no obvious hot spots in download test (running about 380Mbps
> in this case):
>
> 2.24% [kernel] [k] swiotlb_tbl_unmap_single
> 1.88% [kernel] [k] do_raw_spin_lock
> 1.88% [kernel] [k] ioread32
> 1.84% [kernel] [k] tcp_packet
> 1.53% [mac80211] [k] ieee80211_rx_handlers
> 1.31% [kernel] [k] copy_user_generic_string
> 1.19% [ath10k_core] [k] ath10k_htt_rx_amsdu.isra.29
> 1.14% btserver [.] do_big_while()
> 1.09% [kernel] [k] _raw_spin_lock_irqsave
>
> What throughputs are you seeing, and what NICs are you using for AP
> and stations?
With current master branch CUS223-CUS223 should get over 800mbps in
udp tx/rx and 700mbps in tcp tx/rx with cabled setup on a poor AP135.
Since AP135's CPU doesn't have any time to idle around I expect x86 to
perform better and I don't think OTA should be terribly slower in a
reasonably clean environment.
Michał
More information about the ath10k
mailing list