Trouble with key rotation in noisy environments [resend]

Jouni Malinen j
Sun Nov 23 07:35:57 PST 2014

On Mon, Oct 27, 2014 at 07:55:48PM -0400, Avery Pennarun wrote:
> Can anyone else reproduce this?  I'm using ath9k and ath10k APs with
> the latest hostapd from HEAD (though we've tried several other
> versions without any difference).

Are you seeing this with both drivers at roughly the same frequency?

> Steps:
> - Pick a channel in a noisy environment (eg. 2.4 GHz)
> - Configure hostapd to rotate keys extremely frequently:
>     wep_rekey_period=10
>     wpa_group_rekey=10
>     wpa_strict_rekey=1
>     wpa_gmk_rekey=9
>     wpa_ptk_rekey=10

This would obviously be very undesirable configuration for any real use,
but for testing purposes, I guess it can trigger various issues. Though,
I'm not sure whether you would be able to trigger all the same issues
with more reasonable parameter. In practice and assuming you are using
CCMP, there would not really be need to rekey GTK almost ever. Or well,
I guess some use cases may like to use strict rekeying.

Combination of running a PTK rekey and GTK rekey this frequently may hit
some corner cases and there may not be enough to go through the retries
from the ongoing 4-way or 2-way handshake before the next update hits.

> - Connect a client device (tested with Macbook Air and two different
> Linux clients)

Are all clients showing similar issues or do you see a difference based
on which client is used?

> Expected:
> - Key rotations occur every 10 seconds or so but do not affect the traffic.

I'm not sure I would actually have that expectation when combining GTK
and PTK keying (which are independent operations).. If you can reproduce
issues with just one of the key types being triggered every 10 seconds,
that would provide quite a bit clearer cases.

> Actual:
> - Most key rotations go through fine and don't show latency spikes in
> the "blip" application linked above.
> - Every now and then, something goes wrong with the key rotation and
> data stops flowing.  The AP thinks the key rotation has worked, but it
> apparently has not.

There are couple of very different possible reasons for this. hostapd
log could identify the issues where PTK and GTK rekey ended up happening
at more or less same time and retries were not allowed to complete due
to the following operation. If you happen to hit these and can find the
location in hostapd debug log, I'd be interested in taking a closer look
at the details.

Another issue in this area is that some drivers (both AP and station)
have had (and I'd assume, still have) issues in configuring keys (both
PTK(TK) and GTK) during rekey operations. This can result in incorrect
key getting programmed into the hardware and either TX or RX side
operation on one of the devices ends up in corrupting a frame or
dropping a frame incorrectly. This can happen even between key types
(e.g., GTK update resulting in PTK that was configured earlier getting
corrupted). I've seen these cases in the past, but the issues have
almost never been easy enough to reproduce to make it feasible to debug
or fix the issue fully. This is pretty painful, but if you have a full
sniffer capture of the issue, you might be able to use wlantest to find
the frames that are sent with an incorrect key; for
RX-with-incorrect-key, this can be even more inconvenient to find
without debug log from both AP and STA drivers.

> - After about 10 more seconds (possibly because that is the next key
> rotation), hostapd decides the station is broken and disassociates it.
> - Immediately afterward, the station reconnects and resumes its
> activity, and the cycle repeats.

GTK rekey will time out after couple of retries and force reassociation.
It is common that this fixes issue in any of the issue types mentioned
above since this both resets EAPOL state machines and reconfigured
encryption keys to hardware.

As a side note, this recent commit is likely to help robustness with
some client devices:

AP: Extend EAPOL-Key msg 1/4 retry workaround for changing SNonce

Not that I believe this to be the only issue, but that can certainly
help in some cases where the station fails to reply to PTK rekeying (or
initial 4-way handshake) within 100 ms and happens to use a supplicant
design that updates SNonce values during the protocol (which is
something wpa_supplicant does not use due to avoid this type of issues,
but some other supplicant implementations do).

Jouni Malinen                                            PGP id EFC895FA

More information about the Hostap mailing list