htt rx stopped. cannot recover

Michal Kazior michal.kazior at tieto.com
Mon Nov 17 22:56:47 PST 2014


On 18 November 2014 02:20, Pushpal Sidhu <psidhu at gateworks.com> wrote:
> On Fri, Nov 14, 2014 at 12:31 AM, Michal Kazior <michal.kazior at tieto.com> wrote:
>> I've used ath10k AP in a bridge many times and haven't seen this issue yet.
>
> I just found that I can reproduce the problem passing traffic through
> from AP <-wifi-> STA with no bridging. The reason I didn't see the
> problem before is that it takes considerably longer to reproduce and I
> wasn't necessarily looking for it (65 seconds vs 5 seconds).
>
>> Putting an interface into a bridge enables promiscuous mode on it. In
>> ath10k this is handled by creating a monitor vdev to implicitly
>> influence Rx filters so that hw/fw pass everything up to host. I
>> suspect your RF environment may contain particular traffic patterns
>> which trigger the problem within firmware code related to monitor
>> vdev.
>>
>> You could try hacking up ath10k to not create/start monitor vdev and
>> see if you can still reproduce the problem. Keep in mind bridging will
>> be crippled in some cases since firmware won't deliver some frames to
>> ath10k.
>
> I've actually been using attenuators for these tests, a little more
> than 60dB on each chain and as I mentioned above, I don't believe it's
> a vdev issue due to the fact that I can reproduce this without
> bridging (which doesn't put the radio into promiscuous mode and thus
> doesn't create a monitor vdev, correct?).

You can get super sure and take a look in dmesg/syslog to verify if
ath10k's wlan interface is in promisc mode or not. I guess it's not
though.


>> Hmm.. Now that I think about it, after the recent Rx rework it might
>> be possible to circumvent the problem. ath10k uses very little data
>> from Rx indication event and instead Rx descriptor is used for most
>> things. Popping function would need some changes so that it can back
>> off safely if a frame buffer isn't ready. ath10k would probably need
>> to poll for Rx too.
>
> I saw those patches and I liked what I saw. When I tested with them in
> place, I found it failed in the same 'test' of checking for MSDU_DONE
> flag (duh, just wanted to make that clear, haha). I tried not setting
> the confused flag when the issue was hit, but eventually a kernel
> panic would occur.

I'm glad you like them.

The result of not setting the confused flag in this case is pretty expected :-)


> Something very interesting that I found was I only see this issue when
> the radio's are behind a PCIe bridge. That is, I tested on two set's
> of boards with the same CPU/structure, with the one difference that
> there is no PCIe bridge on one. I am currently running an extended
> test on the board without PCIe to see if it isn't an issue that will
> manifest itself at a later time, but as of now, it's looking like
> that's no the case. Do you have any thoughts on this?

Hmm.. Perhaps there's some kind of race and the host main memory isn't
updated in a timely fashion? In that case ath10k could try to, instead
of immediatelly unmapping, keep on syncing dma to cpu and checking the
MSDU_DONE bit for a few msecs (what an ugly workaround..) before it
finally unmaps the frame.

I wonder if that's a bug in ath10k's target device PCIe controller or
the PCIe bridge?


Michał



More information about the ath10k mailing list