system hang with backports-20150511/20150525

Michal Kazior michal.kazior at tieto.com
Sun May 31 23:36:06 PDT 2015


+ath10k list

On 1 June 2015 at 03:37, Marty Faltesek <mfaltesek at google.com> wrote:
> Starting with backports-20150511, and continuing with
> backports-20150525, we see frequent system hangs. backports-20150424
> had no issue.

I don't see such binary releases on
https://backports.wiki.kernel.org/index.php/Main_Page
Hence I don't know what kernel you've backported the drivers from and
I can't compare anything.

Can you provide more details, please?


> After the freeze, the console is non-responsive, as well as the
> network stack (ssh/ping does not work). Using sysrq, I can see log
> messages continuing from ath10k_pci after the freeze, along with some
> other threads as well.

You probably refer to:

[ 1026.951643] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0,
skipped old beacon
[ 1026.951674] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0,
skipped old beacon
[ 1026.951698] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0,
skipped old beacon

What's puzzling to me are these timestamps. SWBA events are generated
by firmware (and sent to host) every beacon interval which is ~100ms
in most cases. In your case however I can see a burst of at least 10
SWBA events within 1ms. Either top(irq) or bottom(tasklet) got stuck
for some time.

It could be useful if you could enable ath10k debugging with
debug_mask=0xffffff3f (this could generate a lot of messages if you're
running traffic through ath10k).


> mac80211/ath10k/cfg80211 are the only modules in use from backports,
> so it seems like a deadlock  could possibly be with mac80211 or
> ath10k.
>
> LOCKDEP didn't reveal anything.

You might want to try tune /proc/sys/kernel/hung_task_timeout_secs
down (e.g. 5 or 10 seconds) and see what happens when you hit the
problem.


> Using a 3.2.26 kernel on ARM. AP mode. No encryption.
>
> I've collected ftrace events for sched mac80211 net napi cfg80211
> workqueue, which are included in the dmesg you can find here because
> of its size:
>
> http://tinyurl.com/dmesg-ftrace
>
> In the logs, the last timestamp that my test script wrote is:
>
> [ 1021.291495] hbeat0352
>
> I've captured  ftrace events before and after 1021.291495.

Your dmesg looks really messy and I'm worried if SWBA events really
came in a burst or not.


Michał



More information about the ath10k mailing list