[RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism
James Prestwood
prestwoj at gmail.com
Tue Nov 26 04:57:36 PST 2024
Hi Remi,
On 11/22/24 8:48 AM, Remi Pommarel wrote:
> It has been reported [0] that a 3-4 seconds (actually up to 5 sec) of
> radio silence could be observed followed by the error below on ath10k
> devices:
>
> ath10k_pci 0000:04:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0
>
> This is due to how the TX queues are flushed in ath10k. When a STA is
> removed, mac80211 need to flush queues [1], but because ath10k does not
> have a lightweight .flush_sta operation, ieee80211_flush_queues() is
> called instead effectively blocking the whole queue during the drain
> causing this radio silence. Also because ath10k_flush() waits for all
> queued to be emptied, not only the flushed ones it could more easily
> take up to 5 seconds to finish making the whole situation worst.
>
> The first patch of this series adds a .flush_sta operation to flush only
> specific STA traffic avoiding the need to stop whole queues and should
> be enough in itself to fix the reported issue.
>
> The second patch of this series is a proposal to improve ath10k_flush so
> that it will be less likely to timeout waiting for non related queues to
> drain.
>
> The abose kernel warning could still be observed (e.g. flushing a dead
> STA) but should be now harmless.
>
> [0]: https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
> [1]: commit 0b75a1b1e42e ("wifi: mac80211: flush queues on STA removal")
I saw in the original report that it indicated it was only for AP mode
but after seeing this and checking some of our clients I saw that this
is also happening in station mode too. I only have clients on 6.2 and
6.8. I can confirm its not occurring on 6.2, but is on 6.8. I also tried
your set of patches but did not notice any behavior difference with or
without them. When it happens, its always just after a roam scan, ~4
seconds go by and we get the failure followed by a "Connection to AP
<mac> lost". Oddly the MAC address is all zeros.
Nov 25 09:09:50 iwd[16256]: src/station.c:station_start_roam() Using
cached neighbor report for roam
Nov 25 09:09:54 kernel: ath10k_pci 0000:02:00.0: failed to flush
transmit queue (skip 0 ar-state 1): 0
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
notification Del Station(20)
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_link_notify() event 16
on ifindex 7
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
notification Deauthenticate(39)
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_deauthenticate_event()
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME
notification Disconnect(48)
Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_disconnect_event()
Nov 25 09:09:54 iwd[16256]: Received Deauthentication event, reason: 4,
from_ap: false
Nov 25 09:09:54 kernel: wlan0: Connection to AP 00:00:00:00:00:00 lost
Other times, the above logs are preceded by this:
Nov 26 00:25:25 kernel: ath10k_pci 0000:02:00.0: failed to flush sta txq
(sta ca:55:b8:7a:91:4b skip 0 ar-state 1): 0
Note, the above logs are with your patches applied. Maybe this is a
separate issue? Or do you think its related?
Thanks,
James
>
> V3:
> - Initialize empty to true to fix smatch error
>
> V2:
> - Add Closes tag
> - Use atomic instead of spinlock for per sta pending frame counter
> - Call ath10k_htt_tx_sta_dec_pending within rcu
> - Rename pending_per_queue[] to num_pending_per_queue[]
>
> Remi Pommarel (2):
> wifi: ath10k: Implement ieee80211 flush_sta callback
> wifi: ath10k: Flush only requested txq in ath10k_flush()
>
> drivers/net/wireless/ath/ath10k/core.h | 2 +
> drivers/net/wireless/ath/ath10k/htt.h | 11 +++-
> drivers/net/wireless/ath/ath10k/htt_tx.c | 49 +++++++++++++++-
> drivers/net/wireless/ath/ath10k/mac.c | 75 ++++++++++++++++++++----
> drivers/net/wireless/ath/ath10k/txrx.c | 11 ++--
> 5 files changed, 127 insertions(+), 21 deletions(-)
>
More information about the ath10k
mailing list