tx watch dog timeout on resume kills device
dsd at laptop.org
Tue Apr 19 14:19:57 EDT 2011
At http://dev.laptop.org/ticket/10748 we're seeing libertas sd8686
dying occasionally during resume.
[ 885.737199] Restarting tasks ... done.
[ 891.020099] libertas: tx watch dog timeout
[ 894.030042] libertas: command 0x000b timed out
[ 894.034676] libertas: Timeout submitting command 0x000b
[ 894.040554] libertas: PREP_CMD: command 0x000b failed: -11
[ 896.010255] libertas: tx watch dog timeout
[ 899.020103] libertas: command 0x001f timed out
[ 899.024664] libertas: Timeout submitting command 0x001f
[ 899.030530] ------------[ cut here ]------------
[ 899.035468] WARNING: at lib/list_debug.c:30 __list_add+0x44/0x5a()
(the list corruption triggered by this failure must be another issue)
I'm still trying to figure out if there is some conflict in command
sequencing with the 0x1f GET_RSSI command submitted upon the timeout,
and 0xb which seems to be submitted by lbs_get_wireless_stats
(unfortunately enabling debug messages seems to avoid the issue)
We're also on 2.6.35; newer kernels don't submit the GET_RSSI command
so we'll be sure to test the latest code as well.
In the mean time, lbs_tx_timeout() seems a bit suspect. It would be
good to get some eyes on it.
I don't understand what this does:
dev->trans_start = jiffies; /* prevent tx timeout */
And the work done by lbs_send_tx_feedback() seems odd (we RX a
being-transmitted packet? Can't see any other driver that does this)
Is calling lbs_host_to_card_done() here likely to screw with any
Finally, how are TX timeouts detected by the network layer? I guess it
could be confused because of time elapsed during suspend? It seems
suspect that we receive a timeout immediately upon resume.
More information about the libertas-dev