wpa_supplicant 4 way handshake timeout with some access points

Jouni Malinen j at w1.fi
Tue Dec 22 02:32:46 PST 2015


On Mon, Dec 21, 2015 at 08:15:09PM -0800, Russell Senior wrote:
> Here are links to the wpa_supplicant log and an over-the-air packet
> capture made from a nearby device, respectively:
> 
> https://personaltelco.net/~russell/wpa2j.log
> https://personaltelco.net/~russell/snoopy2j.pcap
> 
> The two radios of importance are station: 00:0a:52:25:f9:3a and AP:
> 30:5a:3a:51:53:c8.  There are lots of other radios nearby, so it helps
> to filter the pcap file.  The clocks for logging device and the
> pcap'ing device had a chance to synchronize to ntp servers prior to
> the captures, so the clocks should be at least close, within the
> precision of embedded devices.  In this connection, several timeouts
> occurred before eventual success.  From the packet captures, the 4-way
> handshake appears to be finally successful when the replay counter of
> the 4of4 is equal to the replay counter of the most recent in 3of4.

There are at least two separate issues here. As Ilan pointed out,
something strange happens on the station side in the driver or kernel
network stack which delays delivery of the first EAPOL-Key msg 3/4 to
user space. Based on the capture file, that frame was received by the
station within about 10 ms of EAPOL-Key msg 2/4 TX. However, that msg
3/4 is delivered to user space more than 1000 ms later than that.. The
second EAPOL-Key msg 3/4 was delivered shortly thereafter, so it looks
like something in kernel blocked delivery of that exact first TX attempt
of EAPOL-Key msg 3/4 and none of the other EAPOL-Key frames..

This blocking makes the AP miss the EAPOL-Key msg 4/4 response in time
before it tries to retransmit EAPOL-Key msg 3/4 with an incremented
Replay Counter value. It looks like this AP is then rejecting any
EAPOL-Key msg 4/4 with the earlier Replay Counter value. I would not
recommend doing so and have modified hostapd to accept any pending value
just because of this type of issue with the protocol.. Anyway, that's
what the AP here seems to be doing.

After this, the station has already configured the encryption key and it
looks like the driver encrypts all outgoing EAPOL-Key frames from this
point on. However, the received retries of EAPOL-Key msg 3/4 are still
accepted even though they are no encrypted. As such, wpa_supplicant will
see them and will try to reply to them with msg 4/4, but the AP won't be
accepting those responses since they are encrypted with a key that it
has apparently not yet configured.

It would be good for all these initial EAPOL-Key msg 4/4 frames to be
unencrypted and some drivers do have workarounds to make the key
configuration apply only after this frame. However, this is a bit
inconvenient hack to have to do in a driver. I do actually have a
workaround patch for mac80211 to do this:
http://w1.fi/p/0001-mac80211-Do-not-encrypt-EAPOL-frames-before-peer-has.patch

If the station driver you are using here is one that uses mac80211, this
patch might help improve robustness of the connection. However, I'd give
higher priority on figuring out why there is that inconvenient 1000 ms
blocking of indicating a received EAPOL-Key frame to user space since it
is much more convenient to not have to even hit the case of the AP
managing to start retransmission attempts on EAPOL-Key msg 3/4 (which
are quite a pain with the way the protocol was designed).

-- 
Jouni Malinen                                            PGP id EFC895FA



More information about the Hostap mailing list