General firmware stability issue.

Michal Kazior michal.kazior at tieto.com
Mon Jun 23 22:32:25 PDT 2014


On 23 June 2014 22:48, Ben Greear <greearb at candelatech.com> wrote:
> On 06/23/2014 09:05 AM, Ben Greear wrote:
>>
>>
>> On 06/22/2014 11:49 PM, Michal Kazior wrote:
>>> On 19 June 2014 20:58, Ben Greear <greearb at candelatech.com> wrote:
>>>> When using our firmware and kernel mods, we often see our AP system
>>>> crash the firmware after several days of various testing.
>>>>
>>>> Often after this, it takes a full reboot to bring the system back.
>>>
>>> Can you elaborate on this? Why does it need a full reboot?
>>
>> I'll send kernel messages next time it happens, but basically it just
>> fails cold restart over and over again.
>
> Here's logs from a station system that had a problem of this nature.  Since
> it should not be doing any beaconing, I guess the root cause of at least this
> particular problem is different.  This is with our firmware and hacked ath10k
> driver, so of course it is possible it is not an upstream problem.

The cause may be different but the mechanism might be the same, i.e.
at one point target accesses an invalid memory address on host and
controller goes nopenopenope.


> Kernel is 3.14, with most of the ath10k patches from 3.15 backported to it,
> plus additional patches.
>
> http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-3.14.dev.y/.git;a=summary
>
> Jun 22 10:00:00 localhost kernel: ath10k: Creating vdev id: 0  map: 68719476735
> Jun 22 10:00:00 localhost kernel: IPv6: ADDRCONF(NETDEV_UP): sta1: link is not ready
> Jun 22 10:00:00 localhost kernel: ath10k: Creating vdev id: 1  map: 68719476734
> Jun 22 10:00:00 localhost kernel: IPv6: ADDRCONF(NETDEV_UP): sta2: link is not ready
> Jun 22 10:00:01 localhost kernel: ath10k: stop, state OFF
> Jun 22 10:00:02 localhost kernel: e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> Jun 22 10:00:02 localhost kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
> Jun 22 10:00:04 localhost kernel: ath10k: Target ready! transmit resources: 3 size:1792
> Jun 22 10:00:04 localhost kernel: ath10k: wmi event firmware message 'P 73 V 36 T 411'
> Jun 22 10:00:04 localhost kernel: ath10k: wmi event firmware message 'msdu-desc: 808  sw-crypt: 1'
> Jun 22 10:00:04 localhost kernel: ath10k: wmi event firmware message 'alloc rem: 4332 iram: 57220'
> Jun 22 10:00:04 localhost kernel: ath10k: start, state going from OFF to ON
> Jun 22 10:00:04 localhost kernel: ath10k: Creating vdev id: 0  map: 68719476735
> Jun 22 10:00:04 localhost kernel: IPv6: ADDRCONF(NETDEV_UP): sta1: link is not ready
> Jun 22 10:00:05 localhost kernel: ath10k: stop, state OFF
> Jun 22 10:00:08 localhost kernel: ath10k: failed to receive initialized event from target: 00000000
> Jun 22 10:00:08 localhost kernel: ath10k: failed to wait for target to init: -110
> Jun 22 10:00:08 localhost kernel: ath10k: failed to power up target using warm reset: -110
> Jun 22 10:00:08 localhost kernel: ath10k: trying cold reset

You might want to try out my warm reset patch from Kalle's tree to
reduce usage of cold reset.


> Jun 22 10:00:08 localhost kernel: ath10k: target took longer 5000 us to wake up (awake count 1)
> Jun 22 10:00:11 localhost kernel: ath10k: failed to receive initialized event from target: ffffffff

0xffffffff from on ioread32()? It looks as if the device was
disconnected from the bus.

Perhaps your controller is more resilient to the hw cold reset bug and
you just end up with a device that looks as if disconnected, e.g. my
T430 hangs but AP135 just complains with a "data bus error" (both when
cold reset fails). Both cases need a reboot to make stuff work again.


Michał



More information about the ath10k mailing list