ath10k + INTEL_IDLE aka. cstates == firmware crash

Michal Kazior michal.kazior at tieto.com
Mon Feb 23 05:32:24 PST 2015


On 23 February 2015 at 14:08, Fabian Wittenberg
<Fabian.Wittenberg at sophos.com> wrote:
> Hi at all,
>
> we are using the brand new QCA988x chipset based on mini-PCIe cards in our newest wifi enabled firewall appliance and we have had
> a lot of problems to get it running (Intel Rangeley platform; Intel(R) Atom(TM) CPU  C2558  @ 2.40GHz).

I recall one guy complained his Atom-based laptop wasn't happy running
ath10k either but I think it was some electrical incompatibility and
the machine didn't even POST when the card was plugged into mPCIe
slot.


> The card crashed after some minutes using ath10k-driver (backports-3.19-rc1). Older versions are affected as well.
> At least down to 3.12.20. I did intensive debugging and found out, that there
> are major issues as soon as Intels processor cstates are used. This
> option is called "CONFIG_INTEL_IDLE" in kernel config. This seems to be
> a very heavy issue as it even can lead to low memory corruption and
> kernel freezes. Low memory corruption doesn't occure always; just sometimes. This makes it hard to debug.
> Also you need a multi processor system to trigger the issue.
> If you set kernel parameter "maxcpus=1" the error doesn't occure even if you enable CONFIG_INTEL_IDLE.

Through a quick search I've found this:

  https://bugzilla.redhat.com/show_bug.cgi?id=715485

It looks like some BIOSes can have buggy C-state handling. Maybe
that's the root cause? From my experience QCA988x can be sometimes
quirky when it comes to PCIe so I wouldn't be surprised if other
devices don't crash.


> Kernel output looks like this if the card stops working:
>
>
> [ 3715.145865] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
> [ 3715.145876] wifi1: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
> [ 3718.148226] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
> [ 3718.148236] wifi1: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
> [ 3723.152167] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
> [ 3723.152178] wifi0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
> [ 3723.152185] ath10k: failed to transmit management frame via WMI: -11
> [ 3726.154524] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
> [ 3726.154535] wifi0: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
> [ 3729.156884] ath10k: failed to install key for vdev 0 peer 00:0e:8e:ae:5c:1c: -11
> [ 3729.156890] ath10k: failed to transmit management frame via WMI: -11
> [ 3729.156904] wifi0: failed to remove key (0, 00:0e:8e:ae:5c:1c) from hardware (-11)
> [ 3732.159255] ath10k: failed to remove peer wep key 0: -11
> [ 3732.159265] ath10k: failed to clear all peer wep keys for vdev 0: -11
> [ 3732.159273] ath10k: failed to disassociate station: 00:0e:8e:ae:5c:1c vdev 0: -11
[...]

It seems firmware stopped replenishing WMI-HTC Tx credits. It's most
likely not the mgmt-related tx credit starvation but instead
communication with the device is really broken.


> Sometimes but not allways there is the message "firmware crashed!" in dmesg but it doesn't matter which error message it actually is:
> The behavior is allways the same. The card stops working until reboot. Unloading/reloading of ath10k_pci, ath10k_core, ath doesn't help in this case.
> The basic problems of all error messages I saw by now is a broken link between the cards firmware and the ath10k-driver.
> Depending on the point in time this "connection loss" happens the error messages are a little bit different,
> as they are strongly connected to the current state of the driver while it is trying to talk to the cards firmware via WMI.
>
> If you try to reproduce you have to wait between 3 and 60 Minutes to see the crash. You can increase the likelyhood for crashing by increasing
> the number of wifi traffic on foreign networks at the same channel.
> I testet with four laptops that are connected to four QCA988x cards (AP-mode). This takes around 3-10 minutes to get it reproduced.
>
> If you need more information I'm at your disposal.

It'd be nice to know what firmware you're using. Generally I would
discourage from using 999.999.0.636 because it's very old.


Michał



More information about the ath10k mailing list