General firmware stability issue.
greearb at candelatech.com
Mon Jun 23 09:05:20 PDT 2014
On 06/22/2014 11:49 PM, Michal Kazior wrote:
> On 19 June 2014 20:58, Ben Greear <greearb at candelatech.com> wrote:
>> When using our firmware and kernel mods, we often see our AP system
>> crash the firmware after several days of various testing.
>> Often after this, it takes a full reboot to bring the system back.
> Can you elaborate on this? Why does it need a full reboot?
I'll send kernel messages next time it happens, but basically it just
fails cold restart over and over again.
>> For those with ability to debug firmware source,
>> at least some of the time, it is a heap list corruption/assert
>> that crashes us, but I have not nailed down exactly where/why yet.
> Some of the time.. but what happens other time? Any crash dump?
Some times I get crashes where the firmware says it cannot even read
the crash dump registers. Usually this is after an initial dump
(say, heap crash), and shortly after, the cold restart failure problem
>> Based on some email I received, I believe this problem may
>> happen on standard firmware as well.
>> I am curious to know if anyone else sees this type of problem,
>> and with what regularity.
> I'm aware of one problem with beaconing now. Since there's no "beacon
> tx completed" indication ath10k is forced to blindly unmap/free beacon
> sk_buff when next swba event is handled. In some rare cases when
> target wmi pipes get stuck/lag it's possible to get an IOMMU fault
> (provided your platform supports it and it's enabled) that crashes the
> target so badly it's impossible to even use the CE diag window to read
> out the crash dump. Warm reset is ineffective after that and only cold
> reset is able to bring it up again (but also hangs the host sometimes
> due to hw bug).
That is very interesting. It sounds like that could be the problem
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc http://www.candelatech.com
More information about the ath10k