ath10k "failed to install key for vdev 0 peer <mac>: -110"
James Prestwood
prestwoj at gmail.com
Mon Dec 9 04:37:42 PST 2024
On 12/8/24 10:48 PM, Baochen Qiang wrote:
>
> On 12/6/2024 8:27 PM, James Prestwood wrote:
>> Hi Baochen,
>>
>> On 12/5/24 6:47 PM, Baochen Qiang wrote:
>>> On 9/5/2024 9:46 AM, Baochen Qiang wrote:
>>>> On 9/5/2024 2:03 AM, Jeff Johnson wrote:
>>>>> On 8/16/2024 5:04 AM, James Prestwood wrote:
>>>>>> Hi Baochen,
>>>>>>
>>>>>> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>>>>>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've seen this error mentioned on random forum posts, but its always associated
>>>>>>>> with a kernel crash/warning or some very obvious negative behavior. I've noticed
>>>>>>>> this occasionally and at one location very frequently during FT roaming,
>>>>>>>> specifically just after CMD_ASSOCIATE is issued. For our company run networks I'm
>>>>>>>> not seeing any negative behavior apart from a 3 second delay in sending the re-
>>>>>>>> association frame since the kernel waits for this timeout. But we have some
>>>>>>>> networks our clients run on that we do not own (different vendor), and we are
>>>>>>>> seeing association timeouts after this error occurs and in some cases the AP is
>>>>>>>> sending a deauthentication with reason code 8 instead of replying with a
>>>>>>>> reassociation reply and an error status, which is quite odd.
>>>>>>>>
>>>>>>>> We are chasing down this with the vendor of these APs as well, but the behavior
>>>>>>>> always happens after we see this key removal failure/timeout on the client side. So
>>>>>>>> it would appear there is potentially a problem on both the client and AP. My guess
>>>>>>>> is _something_ about the re-association frame changes when this error is
>>>>>>>> encountered, but I cannot see how that would be the case. We are working to get
>>>>>>>> PCAPs now, but its through a 3rd party, so that timing is out of my control.
>>>>>>>>
>>>>>>>> From the kernel code this error would appear innocuous, the old key is failing to
>>>>>>>> be removed but it gets immediately replaced by the new key. And we don't see that
>>>>>>>> addition failing. Am I understanding that logic correctly? I.e. this logic:
>>>>>>>>
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/
>>>>>>>> mac80211/key.c#n503
>>>>>>>>
>>>>>>>> Below are a few kernel logs of the issue happening, some with the deauth being sent
>>>>>>>> by the AP, some with just timeouts:
>>>>>>>>
>>>>>>>> --- No deauth frame sent, just association timeouts after the error ---
>>>>>>>>
>>>>>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to
>>>>>>>> <new BSS>
>>>>>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0
>>>>>>>> peer <previous BSS>: -110
>>>>>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from
>>>>>>>> hardware (-110)
>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>>>>>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0
>>>>>>>> aid=16)
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>>>>>>
>>>>>>>> --- Deauth frame sent amidst the association timeouts ---
>>>>>>>>
>>>>>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to
>>>>>>>> <new BSS>
>>>>>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0
>>>>>>>> peer <previous BSS>: -110
>>>>>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from
>>>>>>>> hardware (-110)
>>>>>>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating
>>>>>>>> (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0
>>>>>>>> aid=101)
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>>>>>>
>>>>>>> Hi James, this is QCA6174, right? could you also share firmware version?
>>>>>> Yep, using:
>>>>>>
>>>>>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
>>>>>> firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp
>>>>>> crc32 bf907c7c
>>>>>>
>>>>>> I did try in one instance the latest firmware, 309, and still saw the
>>>>>> same behavior but 288 is what all our devices are running.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> James
>>>>> Baochen, are you looking more into this? Would prefer to fix the root cause
>>>>> rather than take "[RFC 0/1] wifi: ath10k: improvement on key removal failure"
>>>> I asked CST team to try to reproduce this issue such that we can get firmware dump for
>>>> debug further. What I got is that CST team is currently busy at other critical
>>>> schedules and they are planning to debug this ath10k issue after those schedules get
>>>> finished.
>>>>
>>> Jeff, I am notified that CST team can not reproduce this issue.
>> Thanks for reaching out to them at least. Maybe the firmware team can provide some info
>> about how long it _should_ take to remove a key and we can make the timeout reflect that?
> are you implying that the failure is due to a not-long-enough wait in host driver? or you
> want to know the maximum time firmware needs in removing key, and if it is less than 3s we
> can reduce current timeout to WAR the issue you hit?
No I'm not implying the wait isn't long enough. I would like to know the
maximum time the firmware should take normally and only wait that amount
of time, which would fix the issues we see with Cisco APs.
>
>> Thanks,
>>
>> James
>>
>>
More information about the ath10k
mailing list