Host mode instability & device lockups

Andy Ross andy at plausible.org
Wed May 1 12:27:04 EDT 2013


I'm having a rough time with these devices.  They are fairly reliably
locking up for me under hostapd use after a few hours to a day or use.
I've tried 2 separate TP-Link TL-WN821Nv3 (AR7101) units and 3
separate NetGear WNA1100 (AR9271) without success.  I've tried voodoo
like attaching a heat sink to the TP-Link chip (it does get awfully
hot) and disabling hwcrypt in the driver without effect.

What I'm seeing is that the device just stops responding, looking at
usbmon (I'm on the Fedora 18 3.8 kernel and a firmware build from
yesterday with a quick hack to revert the version number change) I see
things like:

    ffff8801914ae9c0 2368252447 S Bo:1:007:4 -115 16 = 01000008 0188ffff 00142241 00052008
    ffff880191575240 2368352456 S Bo:1:007:4 -115 20 = 0100000c 0188ffff 00152242 00052008 fffffffb
    ffff8801914ae780 2369252420 S Bo:1:007:4 -115 16 = 01000008 0188ffff 00142243 00052008
    ffff880191575180 2369352447 S Bo:1:007:4 -115 20 = 0100000c 0188ffff 00152244 00052008 fffffffb
    ffff88019ef35000 2370252409 S Bo:1:007:4 -115 16 = 01000008 0188ffff 00142245 00052008
    ffff880191575540 2370352445 S Bo:1:007:4 -115 20 = 0100000c 0188ffff 00152246 00052008 fffffffb
    ffff88019ef350c0 2371252447 S Bo:1:007:4 -115 16 = 01000008 0188ffff 00142247 00052008
    ffff880191575600 2371352446 S Bo:1:007:4 -115 20 = 0100000c 0188ffff 00152248 00052008 fffffffb
    ...

Which looks to me (I'm no expert on the kernel USB layer) like the
driver is submitting ("S") pairs of packets (spaced 100ms apart) every
second, forever, without a reply (a "C" callback record).  My
assumption is that these transactions are going un-ACKed at the USB
layer.

Now, this right here seems like a driver bug, becuase if my
understanding is right these urb requests are going to build up
forever and eventually exhaust memory.  The driver really should be
detecting this case and backing out or trying to reset, and it
doesn't.  (And in fact it's worse, because trying to rmmod the module
in this state actually locks up the host network layer somehow!
Routing still works, but I can no longer open new connections or ssh
in.)

But what can be done at the firmware level?  Surely something has
gotten confused here, and shouldn't there be some watchdogging
available to effect a device reset that can be seen by the host?

Any ideas?  I'd really like to get these receivers running reliably as
host nodes, but right now they're basically useless.  The fact that
I've tried two different chips and five units with almost identical
behavior seems to argue strongly against simple hardware instability.

Andy




More information about the ath9k_htc_fw mailing list