tx watch dog timeout on resume kills device

Dan Williams dcbw at redhat.com
Thu Apr 21 11:41:15 EDT 2011


On Tue, 2011-04-19 at 19:19 +0100, Daniel Drake wrote:
> Hi,
> 
> At http://dev.laptop.org/ticket/10748 we're seeing libertas sd8686
> dying occasionally during resume.
> 
> [  885.737199] Restarting tasks ... done.
> [  891.020099] libertas: tx watch dog timeout
> [  894.030042] libertas: command 0x000b timed out
> [  894.034676] libertas: Timeout submitting command 0x000b
> [  894.040554] libertas: PREP_CMD: command 0x000b failed: -11
> [  896.010255] libertas: tx watch dog timeout
> [  899.020103] libertas: command 0x001f timed out
> [  899.024664] libertas: Timeout submitting command 0x001f
> [  899.030530] ------------[ cut here ]------------
> [  899.035468] WARNING: at lib/list_debug.c:30 __list_add+0x44/0x5a()
> 
> (the list corruption triggered by this failure must be another issue)
> 
> I'm still trying to figure out if there is some conflict in command
> sequencing with the 0x1f GET_RSSI command submitted upon the timeout,
> and 0xb which seems to be submitted by lbs_get_wireless_stats
> (unfortunately enabling debug messages seems to avoid the issue)
> 
> We're also on 2.6.35; newer kernels don't submit the GET_RSSI command
> so we'll be sure to test the latest code as well.
> 
> In the mean time, lbs_tx_timeout() seems a bit suspect. It would be
> good to get some eyes on it.
> 
> I don't understand what this does:
> 	dev->trans_start = jiffies; /* prevent tx timeout */

Yeah, I have no idea what's going on there; that code has been there for
a while I think.  This part might be due to a rewrite.  The tx_feedback
stuff is purely for radiotap which we used to use for monitoring and
other stuff.  That code was always a bit questionable to me, perhaps
there are better ways of doing this now?  I haven't looked at what other
drivers do.

But the core issue is what should we do when the card fails to TX a
frame within the timeout?  Is the card really dead?  Or does it just
need more time?

Dan

> And the work done by lbs_send_tx_feedback() seems odd (we RX a
> being-transmitted packet? Can't see any other driver that does this)
> 
> Is calling lbs_host_to_card_done() here likely to screw with any
> pending commands?
> 
> Finally, how are TX timeouts detected by the network layer? I guess it
> could be confused because of time elapsed during suspend? It seems
> suspect that we receive a timeout immediately upon resume.
> 
> Thanks,
> Daniel
> 
> _______________________________________________
> libertas-dev mailing list
> libertas-dev at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/libertas-dev





More information about the libertas-dev mailing list