unicast rekey fundamental flawed (was: connection hangs after wpa_supplicant re-key)

Alexander Wetzel alexander.wetzel at web.de
Wed Sep 27 11:16:23 PDT 2017


Hello,

>> As above, I can work around the problem by increasing
>> dot11RSNAConfigPMKLifetime in the config file.  I also tried setting
>> "fast_reauth=0" but that did not have an impact.  With
>> "dot11RSNAConfigPMKLifetime=31536000" I've seen a solid connection for
>> multiple days.
>> 
>> Any ideas on how I can further debug/fix this?
>
> Some notes above on what this would take.. Either debug from AP or
> sniffer capture and all the needed keys for analysis.
>
> Using a larger dot11RSNAConfigPMKLifetime value sounds like a reasonable
> workaround for this, though. All it does here is give the AP full
> control on when to force PMK rekeying (i.e., in practice, when to force
> EAP reauthentication).

This seems to be the same issue I had in the past and reported/debugged (also with wlan captures) here
https://patchwork.kernel.org/patch/6449291/ and here https://dev.openwrt.org/ticket/18966

The short version is, that unicast rekeys are inherently dangerous when offloading the encryption to the card and using mac80211 from the linux kernel. (Group rekeys are not affected and fine). The root of the evil is directly in the ieee802.11 spec and only was "fixed" in 802.11-2012. The fix hast not been implemented in any wlan Stack I'm aware of, though.. (At least Windows seems to have code to handle the issue as a special case when connected to an linux AP using rekeys. Here the wlan also freezes, but recovers within ~1s.)

Here how I currently understand the issue: (can be wrong and/or incomplete)
When changing the unicast key but having no new key ID to switch over to we are racing the hardware of the wlan card.
It can (in my test environment to 100%) happen, that mac80211 hands over a frame to the wlan card for encryption with a pn belonging to the then still current old key.
While this packet is queued in the wlan card the unicast key is updated and installed in the card. The packet with the old pn is then encrypted with the new key and sent out.
The other end revives the packet, decrypt it successful with the new key and then sets the pn for the new key to the value from the packet. Which is of course way too high, since it belongs to the old key... One or two packets later the correct pn is beeing used, but the reply protection now drops the packets till we reach the pn of the old key (pretty unlikely to happen ever..) or the key is rolled over again, resetting the max seen pn to zero again. The result here is, that a rekey only works if the wlan is idle at the critical time, so no packets are queued when we replace the key.

Switching your wlan card to software encryption prevents the issue for linux systems, but chances are you have to do that on the AP and the client to really prevent the freezes. At least when both are running linux and mac80211. (We no longer race the wlan hardware, preventing key and pn to running out of sync.)

I'm currently back looking at the issue and trying to get an acceptable patch for that together to start a new discussion on linux-wireless.
Since that will probably still take some time I've attached you one older but tested interims version of the new kernel patch I'm working on. 

The patch will not prevent sending the broken packets, it will just detect and handle them for the most probable case (TID=0) on the receiving end. Preventing the issue all together seems to be very hard, expensive and for sure still above my current understanding and coding skills. 

At least in my setup both systems - the AP and the Station - must be patched or the wLan freezes during rekey if there is a data transfer ongoing.
Since I'm normally testing with flood ping and therefore have the same packet load in both directions that's expected.

The patch will print out "HACK: -RESCUE- new key packet with old pn mitigated" when encountering and handling a problematic packet.
Here a quick sample how an mitigated wlan freeze looks with the attached patch:

Sep 10 21:24:21.557801 perry kernel: HACK: virgin key detected, enable HACK code path!
Sep 10 21:24:21.557925 perry kernel: HACK     cnt: 00 00 00 00 00 00
Sep 10 21:24:21.557961 perry kernel: HACK old_cnt: 00 00 00 00 47 69
Sep 10 21:24:21.557986 perry kernel: HACK      pn: 00 00 00 00 47 6b
Sep 10 21:24:21.558016 perry kernel: HACK: -RESCUE- new key packet with old pn mitigated
Sep 10 21:24:21.617804 perry kernel: HACK: virgin key detected, enable HACK code path!
Sep 10 21:24:21.617941 perry kernel: HACK     cnt: 00 00 00 00 00 00
Sep 10 21:24:21.617970 perry kernel: HACK old_cnt: 00 00 00 00 47 6b
Sep 10 21:24:21.618007 perry kernel: HACK      pn: 00 00 00 00 00 01
Sep 10 21:24:21.618034 perry kernel: HACK: Switching key over to normal counter

I hope that helps and make this really hard to debug issue more widely known...

As it is only a small percentage of linux users will be able to tie that to rekeys. And even finding that out there does not help much, since there is absolutely nothing in any debug logs or even a kernel trace. (I tried that all prior to giving up and finally patching wireshark to be able to look at the interesting encrypted packets.) So besides using one of the patches you'll be only able to see issue in a wlan capture when looking for it.
 

Alexander Wetzel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: wpa-hack.patch
Type: text/x-patch
Size: 3786 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/hostap/attachments/20170927/51ba3471/attachment-0001.bin>


More information about the Hostap mailing list