mysterious lockdowns in station mode

Tamas Selmeci tselmeci
Tue Jul 7 05:42:21 PDT 2009


Dear all!

I've a quite strange problem.

Environment:
- Intel XScale IXP425, 32 MB RAM, 8 MB flash;
- linux-2.6.29-rc6 (no preempt, no tickless kernel);
- toolchain by buildroot, gcc-4.2.1, uClibc-0.9.30, binutils-2.18;
- hostapd-0.6.9, wpa_supplicant-0.6.9;
- wifi card: TP-Link TL-WN861N (Atheros 5416 with radio AR2122 (?));
- ath9k: compat-wireless-2009-04-20;

System is running in AP mode (hostapd) without any problem. Switching to 
station mode happens like this (roughly);
- ifconfig wlan0 down;
- killall hostapd;
- modprobe -r ath9k;
- modprobe ath9k;
- wpa_supplicant -i wlan0 -B -c /etc/wpa_supplicant.conf;

---------------------------------------------------------------
wpa_supplicant config:

ctrl_interface=/var/run/wpa_supplicant
eapol_version=1
ap_scan=1
fast_reauth=1
---------------------------------------------------------------

Symptom:

A C program connects to wpa_supplicant, two interfaces, one for the 
regular (wpa_ctrl_open (...)), other one for the unsolicited messages 
(wpa_control_open (...) and then wpa_ctrl_attach (...)).

During scanning for available network it blocks, hardly.

Loop:
-1) send "SCAN";
-2) check pending unsolicited messages;
-3) if no WPA_EVENT_SCAN_RESULTS then: { ms_sleep (250); goto 2)};
-4) process results;

And the brutal freeze happens at 3). Loop 2)-3) has only a few cycles, 
reading unsolicited messages, nothing arrives, then comes the ms_sleep 
(250). ms_sleep does nothing else than delays for 250ms by watching 
kernel jiffies. Instead of 250ms the delay can be up to 10 hours. During 
this freeze the entire program (consists of multiple threads) stops, 
nothing runs, it's proven. After a random period of time ms_sleep gets 
control back ("wakes up") and everything goes further. This can happen 
2-3 times a night.

It's very strange, since:
- ms_sleep is thread-safe;
- ms_sleep has been tested with 8 simultaneous threads over hours, no 
freezes;
- on many other places in the program ms_sleep is also used, without any 
problems for many months now;
- this tends to happen if the first scans returned no visible networks;

My assumption is that some low-level kernel problems reside here. In 
station mode the scanning makes something which will make the kernel 
stop, or my function work bad. Or wpa_supplicant/ath9k problem?

Have you ever met such a problem? I've already been trying to find the 
reasons, fix this, or at least create a workaround for this for at least 
three weeks.

Notes:
- I also used to use -D nl80211 in wpa_supplicant command line, but it 
always reported something like "Unable to enter managed mode", 
regardless of this it seemed to be ok;
- I have an other ms_sleep implementation with usleep, that produces the 
same;

Please help, any idea appreciated...

Regards,
-- 
Tamas Selmeci
R&D Engineer




More information about the Hostap mailing list