wpa_supplicant 4 way handshake timeout with some access points

Galen Seitz galens at seitzassoc.com
Mon Dec 28 20:50:29 PST 2015

On 12/22/15 02:32, Jouni Malinen wrote:
> On Mon, Dec 21, 2015 at 08:15:09PM -0800, Russell Senior wrote:
>> Here are links to the wpa_supplicant log and an over-the-air packet
>> capture made from a nearby device, respectively:
>> https://personaltelco.net/~russell/wpa2j.log
>> https://personaltelco.net/~russell/snoopy2j.pcap
>> The two radios of importance are station: 00:0a:52:25:f9:3a and AP:
>> 30:5a:3a:51:53:c8.  There are lots of other radios nearby, so it helps
>> to filter the pcap file.  The clocks for logging device and the
>> pcap'ing device had a chance to synchronize to ntp servers prior to
>> the captures, so the clocks should be at least close, within the
>> precision of embedded devices.  In this connection, several timeouts
>> occurred before eventual success.  From the packet captures, the 4-way
>> handshake appears to be finally successful when the replay counter of
>> the 4of4 is equal to the replay counter of the most recent in 3of4.
> There are at least two separate issues here. As Ilan pointed out,
> something strange happens on the station side in the driver or kernel
> network stack which delays delivery of the first EAPOL-Key msg 3/4 to
> user space. Based on the capture file, that frame was received by the
> station within about 10 ms of EAPOL-Key msg 2/4 TX. However, that msg
> 3/4 is delivered to user space more than 1000 ms later than that.. The
> second EAPOL-Key msg 3/4 was delivered shortly thereafter, so it looks
> like something in kernel blocked delivery of that exact first TX attempt
> of EAPOL-Key msg 3/4 and none of the other EAPOL-Key frames..
> This blocking makes the AP miss the EAPOL-Key msg 4/4 response in time
> before it tries to retransmit EAPOL-Key msg 3/4 with an incremented
> Replay Counter value. It looks like this AP is then rejecting any
> EAPOL-Key msg 4/4 with the earlier Replay Counter value. I would not
> recommend doing so and have modified hostapd to accept any pending value
> just because of this type of issue with the protocol.. Anyway, that's
> what the AP here seems to be doing.
> After this, the station has already configured the encryption key and it
> looks like the driver encrypts all outgoing EAPOL-Key frames from this
> point on. However, the received retries of EAPOL-Key msg 3/4 are still
> accepted even though they are no encrypted. As such, wpa_supplicant will
> see them and will try to reply to them with msg 4/4, but the AP won't be
> accepting those responses since they are encrypted with a key that it
> has apparently not yet configured.
> It would be good for all these initial EAPOL-Key msg 4/4 frames to be
> unencrypted and some drivers do have workarounds to make the key
> configuration apply only after this frame. However, this is a bit
> inconvenient hack to have to do in a driver. I do actually have a
> workaround patch for mac80211 to do this:
> http://w1.fi/p/0001-mac80211-Do-not-encrypt-EAPOL-frames-before-peer-has.patch
> If the station driver you are using here is one that uses mac80211, this
> patch might help improve robustness of the connection. However, I'd give
> higher priority on figuring out why there is that inconvenient 1000 ms
> blocking of indicating a received EAPOL-Key frame to user space since it
> is much more convenient to not have to even hit the case of the AP
> managing to start retransmission attempts on EAPOL-Key msg 3/4 (which
> are quite a pain with the way the protocol was designed).

Thanks to you and Ilan for looking into this.  Your analysis matches
what we were suspecting.  We have applied the workaround patch for
mac80211 and it does indeed work around the problem.

We would like to investigate the root cause of the issue, but we have
little or no experience with debugging wireless drivers.  If you have
the time, could you please outline how we might go about debugging this
problem?  I have built a new kernel with event tracing enabled, and I
have debugfs enabled for mac80211 and the driver(rt2x00), but I don't
really know where to begin.

Galen Seitz
galens at seitzassoc.com

More information about the Hostap mailing list