Hard lockup during vif restart tests.

Michal Kazior michal.kazior at tieto.com
Tue Sep 16 23:34:44 PDT 2014


On 16 September 2014 20:42, Ben Greear <greearb at candelatech.com> wrote:
> This is on a 3.14.14+ hacked kernel, with CT firmware.
>
> Test case is to restart stations (and the AP
> on the other side) every 10-30 seconds.
> After a bit, the station machine locked up hard.
>
> I have no idea how to trouble-shoot this better, so this is
> just FYI.
>
[...]
> ath10k: boot warm reset complete
> ath10k: failed to power up target using warm reset: -110
> ath10k: trying cold reset
> ath10k: boot cold reset
> ath10k: boot cold reset complete
> [hang, even sysrq will not work]

There's a known problem with cold reset being capable of locking up
entire system (depends on the pci-e controller, e.g. AP135 splats a
Data Bus Error instead).

Actually warm reset can do the same in some corner cases: try running
Rx traffic and just start the recovery sequence (without actually
crashing the fw). My x86 locks up very easily with this.

I strongly suggest you use reset_mode=1 when you load ath10k_pci so
cold reset isn't used. This may result in ath10k being unable to bring
up the device in some rare cases (e.g. after an IOMMU fault if your
system supports it) but I believe it's far better than having the
whole system lock up.

My suspicion is tx/rx rings, dma transfer engines, internal irqs
aren't stopped properly. I have a prototype patch for the warm reset
problem but it's incomplete and I'm not sure if I can share it yet.


Michał



More information about the ath10k mailing list