[PATCH ath-next v3] wifi: ath11k: fix group data packet drops during rekey
Rameshkumar Sundaram
rameshkumar.sundaram at oss.qualcomm.com
Tue Aug 5 09:34:33 PDT 2025
On 8/5/2025 2:55 PM, Raja Mani wrote:
>
>> During GTK rekey, mac80211 issues a clear key (if the old key exists)
>> followed by an install key operation in the same context. This causes
>> ath11k to send two WMI commands in quick succession: one to clear the
>> old key and another to install the new key in the same slot.
>>
>> Under certain conditions—especially under high load or time sensitive
>> scenarios, firmware may process these commands asynchronously in a way
>> that firmware assumes the key is cleared whereas hardware has a valid
>> key.
>
> IIUC, The firmware cleared the old key in the hardware but it missed
> to install new key in the problematic condition.
>
> The rest LGTM.
>
Not exactly, Firmware has processed clear key and set key in right
order, however the clear key call back from Hardware arrived much later
(after the firmware had processed the set key request from host) and
this clear key call back wiped out the latest key that was set in firmware.
But hardware will have a valid key as it had later processed the set key
request from firmware.
adding firmware analysis:
----
The race condition exists between Key clear CONNECTION BLOCK call back
and next Key install WMI command causing this issue of no mcast frames OTA.
1. WMI Key clear command is received - initiates CONNECTION BLOCK for peer
2. WMI Key Install is received, sets Key into the HW and initiates
CONNECTION BLOCK for peer
3. WMI key clear CONNECTION BLOCK callback is received from HW and this
*clears the key* and initiates CONNECTION UNBLOCK for the peer
>>>> this is where the problem of clearing the latest set key occurs
4. Key Install CONNECTION BLOCK callback is received from HW - this is
supposed to update the mpduQ length on to the TQM, but due to this race
condition, as the key install flags are cleared, this update fails and
it enters the limbo state.
>> the callbacks from HW seems to not handled properly for immediate
clear followed by set, so basically when set key call back is invoked
the key doesn't exist at all in FW (as previous clear key call back had
cleared it unknowingly)
5. Finally the CONNECTION UNBLOCK schedule enables the peer.
Now the mpduQ is having higher MPDU length but the encryption is not
enabled causing this GT MPDU flush.
----
>> This inconsistency between hardware and firmware leads to group addressed
>> packet drops. Only setting the same key again can restore a valid key in
>> firmware and allow packets to be transmitted.
>>
>> This issue remained latent because the host's clear key commands were
>> not effective in firmware until commit 436a4e886598 ("ath11k: clear the
>> keys properly via DISABLE_KEY"). That commit enabled the host to
>> explicitly clear group keys, which inadvertently exposed the race.
>>
>> To mitigate this, restrict group key clearing across all modes (AP, STA,
>> MESH). During rekey, the new key can simply be set on top of the previous
>> one, avoiding the need for a clear followed by a set.
>>
>> However, in AP mode specifically, permit group key clearing when no
>> stations are associated. This exception supports transitions from secure
>> modes (e.g., WPA2/WPA3) to open mode, during which all associated peers
>> are removed and the group key is cleared as part of the transition.
>>
>> Add a per-BSS station counter to track the presence of stations during
>> set key operations. Also add a reset_group_keys flag to track the key
>> re-installation state and avoid repeated installation of the same key
>> when the number of connected stations transitions to non-zero within a
>> rekey period.
>>
>> Additionally, for AP and Mesh modes, when the first station associates,
>> reinstall the same group key that was last set. This ensures that the
>> firmware recovers from any race that may have occurred during a previous
>> key clear when no stations were associated.
>>
>> This change ensures that key clearing is permitted only when no clients
>> are connected, avoiding packet loss while enabling dynamic security mode
>> transitions.
>>
>> Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.9.0.1-02146-QCAHKSWPL_SILICONZ-1
>> Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-
>> QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
>>
>> Reported-by: Steffen Moser <lists at steffen-moser.de>
>> Closes: https://lore.kernel.org/linux-wireless/c6366409-9928-4dd7-
>> bf7b- ba7fcf20eabf at steffen-moser.de
>> Fixes: 436a4e886598 ("ath11k: clear the keys properly via DISABLE_KEY")
>> Signed-off-by: Rameshkumar Sundaram
>> <rameshkumar.sundaram at oss.qualcomm.com>
>> ---
>> v3:
>> - Allowed clear key strictly only for AP mode
>> - Rephrased the commit text and comment in ath11k_mac_op_set_key()
>> to explain that the race occures rarely under heavy load and
>> discrepancy of states between firmware and hardware
>> - Added newline (\n) at the end of warn log in ath11k_set_group_keys()
>> - Made addr varaible as const * in ath11k_set_group_keys()
>> v2:
>> - Followed r-xmas style
>> - Removed vdev_type check before calling ath11k_set_group_keys()
>> - Removed lockdep assert in ath11k_set_group_keys()
>> - Removed flags variable and passed WMI_KEY_GROUP in
>> ath11k_set_group_keys()
>> - Changed the if condition to have positive cases in
>> ath11k_mac_op_set_key()
>> - Added code comments in ath11k_mac_op_set_key() to explain how
>> clear key and
>> set key are issue back to back by mac80211 and added high level
>> information
>> about the firmware race.
>> - Added comments in ath11k_mac_op_set_key() and
>> ath11k_mac_station_add() to
>> explain why a reinstall of same key is needed.
>> ---
>> drivers/net/wireless/ath/ath11k/core.h | 2 +
>> drivers/net/wireless/ath/ath11k/mac.c | 111 +++++++++++++++++++++++--
>> 2 files changed, 104 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath11k/core.h b/drivers/net/
>> wireless/ath/ath11k/core.h
>> index 220d69a7a429..e8780b05ce11 100644
>> --- a/drivers/net/wireless/ath/ath11k/core.h
>> +++ b/drivers/net/wireless/ath/ath11k/core.h
>> @@ -411,6 +411,8 @@ struct ath11k_vif {
>> bool do_not_send_tmpl;
>> struct ath11k_arp_ns_offload arp_ns_offload;
>> struct ath11k_rekey_data rekey_data;
>> + u32 num_stations;
>> + bool reinstall_group_keys;
>> struct ath11k_reg_tpc_power_info reg_tpc_info;
>> diff --git a/drivers/net/wireless/ath/ath11k/mac.c b/drivers/net/
>> wireless/ath/ath11k/mac.c
>> index 1fadf5faafb8..106e2530b64e 100644
>> --- a/drivers/net/wireless/ath/ath11k/mac.c
>> +++ b/drivers/net/wireless/ath/ath11k/mac.c
>> @@ -4317,6 +4317,40 @@ static int ath11k_clear_peer_keys(struct
>> ath11k_vif *arvif,
>> return first_errno;
>> }
>> +static int ath11k_set_group_keys(struct ath11k_vif *arvif)
>> +{
>> + struct ath11k *ar = arvif->ar;
>> + struct ath11k_base *ab = ar->ab;
>> + const u8 *addr = arvif->bssid;
>> + int i, ret, first_errno = 0;
>> + struct ath11k_peer *peer;
>> +
>> + spin_lock_bh(&ab->base_lock);
>> + peer = ath11k_peer_find(ab, arvif->vdev_id, addr);
>> + spin_unlock_bh(&ab->base_lock);
>> +
>> + if (!peer)
>> + return -ENOENT;
>> +
>> + for (i = 0; i < ARRAY_SIZE(peer->keys); i++) {
>> + struct ieee80211_key_conf *key = peer->keys[i];
>> +
>> + if (!key || (key->flags & IEEE80211_KEY_FLAG_PAIRWISE))
>> + continue;
>> +
>> + ret = ath11k_install_key(arvif, key, SET_KEY, addr,
>> + WMI_KEY_GROUP);
>> + if (ret < 0 && first_errno == 0)
>> + first_errno = ret;
>> +
>> + if (ret < 0)
>> + ath11k_warn(ab, "failed to set group key of idx %d for
>> vdev %d: %d\n",
>> + i, arvif->vdev_id, ret);
>> + }
>> +
>> + return first_errno;
>> +}
>> +
>> static int ath11k_mac_op_set_key(struct ieee80211_hw *hw, enum
>> set_key_cmd cmd,
>> struct ieee80211_vif *vif, struct ieee80211_sta *sta,
>> struct ieee80211_key_conf *key)
>> @@ -4326,6 +4360,7 @@ static int ath11k_mac_op_set_key(struct
>> ieee80211_hw *hw, enum set_key_cmd cmd,
>> struct ath11k_vif *arvif = ath11k_vif_to_arvif(vif);
>> struct ath11k_peer *peer;
>> struct ath11k_sta *arsta;
>> + bool is_ap_with_no_sta;
>> const u8 *peer_addr;
>> int ret = 0;
>> u32 flags = 0;
>> @@ -4386,16 +4421,57 @@ static int ath11k_mac_op_set_key(struct
>> ieee80211_hw *hw, enum set_key_cmd cmd,
>> else
>> flags |= WMI_KEY_GROUP;
>> - ret = ath11k_install_key(arvif, key, cmd, peer_addr, flags);
>> - if (ret) {
>> - ath11k_warn(ab, "ath11k_install_key failed (%d)\n", ret);
>> - goto exit;
>> - }
>> + ath11k_dbg(ar->ab, ATH11K_DBG_MAC,
>> + "%s for peer %pM on vdev %d flags 0x%X, type = %d, num_sta
>> %d\n",
>> + cmd == SET_KEY ? "SET_KEY" : "DEL_KEY", peer_addr, arvif-
>> >vdev_id,
>> + flags, arvif->vdev_type, arvif->num_stations);
>> +
>> + /* Allow group key clearing only in AP mode when no stations are
>> + * associated. There is a known race condition in firmware where
>> + * group addressed packets may be dropped if the key is cleared
>> + * and immediately set again during rekey.
>> + *
>> + * During GTK rekey, mac80211 issues a clear key (if the old key
>> + * exists) followed by an install key operation for same key
>> + * index. This causes ath11k to send two WMI commands in quick
>> + * succession: one to clear the old key and another to install the
>> + * new key in the same slot.
>> + *
>> + * Under certain conditions—especially under high load or time
>> + * sensitive scenarios, firmware may process these commands
>> + * asynchronously in a way that firmware assumes the key is
>> + * cleared whereas hardware has a valid key. This inconsistency
>> + * between hardware and firmware leads to group addressed packet
>> + * drops after rekey.
>> + * Only setting the same key again can restore a valid key in
>> + * firmware and allow packets to be transmitted.
>> + *
>> + * There is a use case where an AP can transition from Secure mode
>> + * to open mode without a vdev restart by just deleting all
>> + * associated peers and clearing key, Hence allow clear key for
>> + * that case alone. Mark arvif->reinstall_group_keys in such cases
>> + * and reinstall the same key when the first peer is added,
>> + * allowing firmware to recover from the race if it had occurred.
>> + */
>> - ret = ath11k_dp_peer_rx_pn_replay_config(arvif, peer_addr, cmd,
>> key);
>> - if (ret) {
>> - ath11k_warn(ab, "failed to offload PN replay detection %d\n",
>> ret);
>> - goto exit;
>> + is_ap_with_no_sta = (vif->type == NL80211_IFTYPE_AP &&
>> + !arvif->num_stations);
>> + if ((flags & WMI_KEY_PAIRWISE) || cmd == SET_KEY ||
>> is_ap_with_no_sta) {
>> + ret = ath11k_install_key(arvif, key, cmd, peer_addr, flags);
>> + if (ret) {
>> + ath11k_warn(ab, "ath11k_install_key failed (%d)\n", ret);
>> + goto exit;
>> + }
>> +
>> + ret = ath11k_dp_peer_rx_pn_replay_config(arvif, peer_addr,
>> cmd, key);
>> + if (ret) {
>> + ath11k_warn(ab, "failed to offload PN replay detection
>> %d\n",
>> + ret);
>> + goto exit;
>> + }
>> +
>> + if ((flags & WMI_KEY_GROUP) && cmd == SET_KEY &&
>> is_ap_with_no_sta)
>> + arvif->reinstall_group_keys = true;
>> }
>> spin_lock_bh(&ab->base_lock);
>> @@ -4994,6 +5070,7 @@ static int ath11k_mac_inc_num_stations(struct
>> ath11k_vif *arvif,
>> return -ENOBUFS;
>> ar->num_stations++;
>> + arvif->num_stations++;
>> return 0;
>> }
>> @@ -5009,6 +5086,7 @@ static void ath11k_mac_dec_num_stations(struct
>> ath11k_vif *arvif,
>> return;
>> ar->num_stations--;
>> + arvif->num_stations--;
>> }
>> static u32 ath11k_mac_ieee80211_sta_bw_to_wmi(struct ath11k *ar,
>> @@ -9540,6 +9618,21 @@ static int ath11k_mac_station_add(struct ath11k
>> *ar,
>> goto exit;
>> }
>> + /* Driver allows the DEL KEY followed by SET KEY sequence for
>> + * group keys for only when there is no clients associated, if at
>> + * all firmware has entered the race during that window,
>> + * reinstalling the same key when the first sta connects will allow
>> + * firmware to recover from the race.
>> + */
>> + if (arvif->num_stations == 1 && arvif->reinstall_group_keys) {
>> + ath11k_dbg(ab, ATH11K_DBG_MAC, "set group keys on 1st station
>> add for vdev %d\n",
>> + arvif->vdev_id);
>> + ret = ath11k_set_group_keys(arvif);
>> + if (ret)
>> + goto dec_num_station;
>> + arvif->reinstall_group_keys = false;
>> + }
>> +
>> arsta->rx_stats = kzalloc(sizeof(*arsta->rx_stats), GFP_KERNEL);
>> if (!arsta->rx_stats) {
>> ret = -ENOMEM;
>>
>> base-commit: 4cedae6335644a5858e1bc2c367aedc10482b654
>
--
--
Ramesh
More information about the ath11k
mailing list