Reproducible issue in hacked 3.17 kernel, CT firmware

Wed Jan 7 01:58:35 PST 2015

On 30 December 2014 at 20:18, Ben Greear <greearb at candelatech.com> wrote:
> yeah, so maybe not reproducible upstream, but anyway...
>
> My test case is to re-associate 4 stations over and over again, with
> a scan and a 5 second sleep between iterations.  After
> a short time, something goes weird and OS is mostly hung, probably
> because important locks are held while ath10k is timing out communication
> to firmware.
>
> The last message I see from firmware is that it is deleting vdev 4.
>
> I do not see any indication that firmware is crashed, but something
> is wrong, maybe mgt buffers are used up?
[...]
> [  342.962494] ath10k_pci 0000:04:00.0: failed to set erp slot for vdev 4: -11

-11 = -EAGAIN = out of wmi-htc tx credits. I wonder what the dbg
buffer is trying to say.

Either host sent a corrupted message and clogged up firmware buffers,
firmware is busy processing other commands (wmi mgmt tx, wmi bcn
non-dma tx) or became confused/corrupted.

> I'm going to debug this further, but I am curious why the logs appear
> to show that we continue sending cmds (cts_prot, for example) after the
> vdev is configured down?

This is implied by mac80211. See ieee80211_set_disassoc(): it calls
sta_info_flush() then ieee80211_reset_erp_info() and later
ieee80211_bss_info_change_notify(). These yield ath10k_bss_disassoc()
and later ath10k_bss_info_changed() respectively.

Michał