tx watch dog timeout on resume kills device

Daniel Drake dsd at laptop.org
Tue Apr 19 14:19:57 EDT 2011


At http://dev.laptop.org/ticket/10748 we're seeing libertas sd8686
dying occasionally during resume.

[  885.737199] Restarting tasks ... done.
[  891.020099] libertas: tx watch dog timeout
[  894.030042] libertas: command 0x000b timed out
[  894.034676] libertas: Timeout submitting command 0x000b
[  894.040554] libertas: PREP_CMD: command 0x000b failed: -11
[  896.010255] libertas: tx watch dog timeout
[  899.020103] libertas: command 0x001f timed out
[  899.024664] libertas: Timeout submitting command 0x001f
[  899.030530] ------------[ cut here ]------------
[  899.035468] WARNING: at lib/list_debug.c:30 __list_add+0x44/0x5a()

(the list corruption triggered by this failure must be another issue)

I'm still trying to figure out if there is some conflict in command
sequencing with the 0x1f GET_RSSI command submitted upon the timeout,
and 0xb which seems to be submitted by lbs_get_wireless_stats
(unfortunately enabling debug messages seems to avoid the issue)

We're also on 2.6.35; newer kernels don't submit the GET_RSSI command
so we'll be sure to test the latest code as well.

In the mean time, lbs_tx_timeout() seems a bit suspect. It would be
good to get some eyes on it.

I don't understand what this does:
	dev->trans_start = jiffies; /* prevent tx timeout */

And the work done by lbs_send_tx_feedback() seems odd (we RX a
being-transmitted packet? Can't see any other driver that does this)

Is calling lbs_host_to_card_done() here likely to screw with any
pending commands?

Finally, how are TX timeouts detected by the network layer? I guess it
could be confused because of time elapsed during suspend? It seems
suspect that we receive a timeout immediately upon resume.


More information about the libertas-dev mailing list