[REGRESSION] ath10k: failed to flush transmit queue

James Prestwood prestwoj at gmail.com
Thu Feb 20 05:55:54 PST 2025


Hi All,

On 7/31/24 11:13 AM, Kalle Valo wrote:
> Felix Fietkau <nbd at nbd.name> writes:
>
>> On 12.07.24 04:23, Cedric Veilleux wrote:
>>
>>> AP mode.
>>> Both 2.4 and 5ghz channels.
>>> Using WLE600VX (QCA986x/988x), we are seeing the following errors in
>>> kernel logs:
>>> [12978.022077] ath10k_pci 0000:04:00.0: failed to flush transmit
>>> queue
>>> (skip 0 ar-state 1): 0
>>> [13343.069189] ath10k_pci 0000:04:00.0: failed to flush transmit queue
>>> (skip 0 ar-state 1): 0
>>> They are somewhat random but frequent. Can happen once a day or many
>>> times per hour.
>>> They are associated with 3-4 seconds of radio silence. Full packet
>>> loss. Then everything resumes normally, STA are still associated and
>>> traffic resumes.
>>> I have tested with major kernel versions:
>>> 6.1.97: stable (tested for many days on 10+ access points)
>>> 6.2.16: stable (tested for few hours single machine)
>>> 6.3.13: stable (tested for few hours single machine)
>>> 6.4.16: unstable  (we have errors within an hour)
>>> 6.5.13: unstable  (we have errors within an hour)
>>> 6.6.39: unstable  (we have errors within an hour)
>>> 6.7.12: unstable  (we have errors within an hour)
>>> 6.8.10: unstable  (we have errors within an hour)
>>> 6.9.7: unstable  (we have errors within an hour)
>>>   From these tests I believe something changed in 6.4 series causing
>>> instabilities and the dreaded "failed to flush transmit queue" error.
>>> This is a custom linux distribution. Only change is the kernel. All
>>> other packages are same versions. Everything rebuilt from source using
>>> bitbake/yocto. Same linux-firmware files.
>> I'm pretty sure it's caused by this commit:
>>
>> commit 0b75a1b1e42e07ae84e3a11d2368b418546e2bec
>> Author: Johannes Berg <johannes.berg at intel.com>
>> Date:   Fri Mar 31 16:59:16 2023 +0200
>>
>>      wifi: mac80211: flush queues on STA removal
>>
>> I guess somebody needs to look into making the queue flush on ath10k
>> more reliable (or even better, implement a more lightweight .flush_sta
>> op).
>>
>> I don't have time to do the work myself, but hopefully this
>> information could help somebody else take care of it.
> Adding ath10k list so that everyone see this.

I want to revive this thread and provide some additional data. This is 
not just something that happens in AP mode, or specifically with the 
hardware mentioned. After upgrading from 6.2 to 6.8 we started seeing 
this on client devices running the QCA6174 hw 3.2 firmware ver 
WLAN.RM.4.4.1-00288- api 6. We see it during disconnects which isn't as 
big of a deal, the more concerning time is during roams which makes 
roams go from less than 200ms to over 5 seconds.

Based on this report I have tried using Remi's set of patches [1] which 
implement flush_sta(), but we end up with the same ~5 second hang, just 
in ath10k_flush_sta() instead of ath10k_flush(). I'm unsure if this is a 
firmware problem, or some race within the driver itself. In the past I 
have reduced timeouts [2] to work around these type of things but its 
really just a band-aid.

I would agree that this was "introduced" by Johannes' commit above, but 
the original commit does make sense... This is just an ath10k problem 
with flushing the queue's.

At this point I'm really left with two options:

  - Revert Johannes commit to flush the queues, thereby reducing 
security, OR

  - Reduce the timeout from 5 seconds to something more manageable, like 
1 second (hopefully someone more in the know can comment here).

Has anyone else looked at this regression? Maybe has some workaround 
other than my options above?

Thanks,

James

[1] 
https://lore.kernel.org/linux-wireless/17d26d6a3e80ff03939ee7935fdc07f979b61a4f.1732293922.git.repk@triplefau.lt/

[2] 
https://lore.kernel.org/linux-wireless/20240814164507.996303-2-prestwoj@gmail.com/




More information about the ath10k mailing list