ath10k "failed to install key for vdev 0 peer <mac>: -110"
Baochen Qiang
quic_bqiang at quicinc.com
Mon Nov 25 18:56:06 PST 2024
On 11/25/2024 9:32 PM, James Prestwood wrote:
> Hi Baochen,
>
> On 9/4/24 6:46 PM, Baochen Qiang wrote:
>>
>> On 9/5/2024 2:03 AM, Jeff Johnson wrote:
>>> On 8/16/2024 5:04 AM, James Prestwood wrote:
>>>> Hi Baochen,
>>>>
>>>> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>>>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've seen this error mentioned on random forum posts, but its always associated with
>>>>>> a kernel crash/warning or some very obvious negative behavior. I've noticed this
>>>>>> occasionally and at one location very frequently during FT roaming, specifically
>>>>>> just after CMD_ASSOCIATE is issued. For our company run networks I'm not seeing any
>>>>>> negative behavior apart from a 3 second delay in sending the re-association frame
>>>>>> since the kernel waits for this timeout. But we have some networks our clients run
>>>>>> on that we do not own (different vendor), and we are seeing association timeouts
>>>>>> after this error occurs and in some cases the AP is sending a deauthentication with
>>>>>> reason code 8 instead of replying with a reassociation reply and an error status,
>>>>>> which is quite odd.
>>>>>>
>>>>>> We are chasing down this with the vendor of these APs as well, but the behavior
>>>>>> always happens after we see this key removal failure/timeout on the client side. So
>>>>>> it would appear there is potentially a problem on both the client and AP. My guess
>>>>>> is _something_ about the re-association frame changes when this error is
>>>>>> encountered, but I cannot see how that would be the case. We are working to get
>>>>>> PCAPs now, but its through a 3rd party, so that timing is out of my control.
>>>>>>
>>>>>> From the kernel code this error would appear innocuous, the old key is failing to
>>>>>> be removed but it gets immediately replaced by the new key. And we don't see that
>>>>>> addition failing. Am I understanding that logic correctly? I.e. this logic:
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/
>>>>>> mac80211/key.c#n503
>>>>>>
>>>>>> Below are a few kernel logs of the issue happening, some with the deauth being sent
>>>>>> by the AP, some with just timeouts:
>>>>>>
>>>>>> --- No deauth frame sent, just association timeouts after the error ---
>>>>>>
>>>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to
>>>>>> <new BSS>
>>>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0
>>>>>> peer <previous BSS>: -110
>>>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from
>>>>>> hardware (-110)
>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>>>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>>>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>>>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0
>>>>>> aid=16)
>>>>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>>>>
>>>>>> --- Deauth frame sent amidst the association timeouts ---
>>>>>>
>>>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to
>>>>>> <new BSS>
>>>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0
>>>>>> peer <previous BSS>: -110
>>>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from
>>>>>> hardware (-110)
>>>>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating
>>>>>> (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>>>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>>>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0
>>>>>> aid=101)
>>>>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>>>>
>>>>> Hi James, this is QCA6174, right? could you also share firmware version?
>>>> Yep, using:
>>>>
>>>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
>>>> firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp
>>>> crc32 bf907c7c
>>>>
>>>> I did try in one instance the latest firmware, 309, and still saw the
>>>> same behavior but 288 is what all our devices are running.
>>>>
>>>> Thanks,
>>>>
>>>> James
>>> Baochen, are you looking more into this? Would prefer to fix the root cause
>>> rather than take "[RFC 0/1] wifi: ath10k: improvement on key removal failure"
>> I asked CST team to try to reproduce this issue such that we can get firmware dump for
>> debug further. What I got is that CST team is currently busy at other critical schedules
>> and they are planning to debug this ath10k issue after those schedules get finished.
>
> Any movement on this front? We are still carrying that RFC patch to work around the
> associated compatibility issues with Cisco APs when this timeout occurs.
I ask the test team again, the response is that hopefully they can get bandwidth next week.
>
> While I do agree the RFC patch isn't optimal, trying to get a firmware fix for ~6 year old
> hardware also may not be very easy. fwiw we've been running the RFC patch for about 3
> months now, as of today its running on over 4000 client devices. So IMO the patch itself
> is safe if there was any concern.
thanks for the info.
>
> Thanks,
>
> James
>
More information about the ath10k
mailing list