Re: ath11k: WCN6855 WoWLAN resume leaves RX in unrecoverable reorder state → TCP collapses
Baochen Qiang
baochen.qiang at oss.qualcomm.com
Mon May 25 18:51:06 PDT 2026
On 5/25/2026 9:24 PM, Hauke Mehrtens wrote:
> I used AI to help me debug this problem.
>
> On Lenovo ThinkPad P14s G4 AMD (QCNFA765 / WCN6855 hw2.1), ~1 in 10
> suspend/resume cycles leaves the ath11k RX path delivering MSDUs out of
> order (~16% of TCP segments). TCP cwnd stays at 1-3 MSS and goodput
> collapses to ~3 Mbit/s; UDP on the same link in the same minute pushes
> 100+ Mbit/s.
>
> This machine is in the DMI quirk list at
> `drivers/net/wireless/ath/ath11k/core.c` that forces `ATH11K_PM_WOW`.
> In WOW mode the firmware is kept alive across suspend; the WOW resume
> path does not re-initialise REO HW or per-TID BA state.
> The PM_WOW quirk was added as a workaround for unexpected-wakeup bug
> https://bugzilla.kernel.org/show_bug.cgi?id=219196
>
> ## Affected components
>
> - **Driver:** ath11k_pci
> - **Chip:** WCN6855 hw2.1 (`17cb:1103`, QCNFA765)
> - **Firmware:** `WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41`
> (fw_version `0x11088c35`, 2024-04-17)
> - **Kernel:** observed on 7.0.9-arch1-1; also present across many
> earlier kernel versions over the past >1 year on this hardware
> - **Machine:** Lenovo ThinkPad P14s G4 AMD, DMI `21K5CTO1WW` (matches
> quirk entry "P14s G4 AMD #1" at `core.c:961-966` via `"21K5"`
> substring)
>
> ## Reproduce
>
> 1. Associate to an HE AP (characterised at 6 GHz, HE-MCS 5/6 NSS 2
> 160 MHz, -56 dBm, using MT7915 with OpenWrt 25.12).
> 2. Suspend, wake, test `iperf3` TCP. Repeat. On average within ~10
> cycles, one resume leaves the link broken.
> 3. In the broken state: `iw dev wlpXsY link` still reports ~1.3 Gbit/s
> "bitrate". Ping and UDP iperf3 look fine. TCP iperf3 collapses to
> ~3 Mbit/s with cwnd stuck at 1-3 MSS.
>
> ## Evidence
>
> ### iperf3, same link same minute
>
> ```
> AP -> STA, UDP -b 200M -l 1400 -t 15:
> sender: 200 Mbit/s, 267876 datagrams
> receiver: 102 Mbit/s, 137290 received, 130585 "lost"
> (not real loss; iperf3 UDP counts out-of-window arrivals as lost)
>
> AP -> STA, TCP -t 15:
> 3.43 Mbit/s, 521 retransmits, cwnd 1.41-5.66 KB throughout
> ```
>
> ### UDP run: no real loss anywhere
>
> - `ip -s link` delta: `+267,953 packets`, `0 errors`, `0 dropped`
> (AP sent 267,876).
> - `/proc/net/snmp` Udp: `RcvbufErrors 0, InErrors 0`.
> - ath11k `pdev_stats` delta: `MSDUs delivered to HTT +267,985`.
> - `soc_dp_stats` entirely zero: no RXDMA / REO / HAL / TCL / backpressure
> errors of any kind.
> - AP `iw station get`: ~1.3% retry rate, -65 dBm ACK signal,
> `expected throughput 1049 Mbps`.
>
> → Air link clean. Host data path clean. Firmware delivered every
> datagram. No drops anywhere.
>
> ### TCP socket reorder (`ss -tin` once per second during TCP iperf3)
>
> ```
> t (s) bytes_rx segs_in rcv_ooopack
> 0 1,291,653 895 158
> 1 1,717,365 1,189 210
> 2 2,060,541 1,426 274
> 3 2,519,557 1,743 335
> 4 3,050,973 2,110 397
> 5 3,446,277 2,383 450
> 6 3,906,741 2,701 513
> ```
>
> ~60 ooo packets/s out of ~370 segs/s = **~16% out-of-order**, sustained.
>
> ### Packet-level pattern (`tcpdump` on wlpXsY)
>
> Seq normalised to 0 at flow start:
>
> ```
> 22 ms 2896:4344
> 25 ms 4344:5792
> 27 ms 1448:2896 <-- late; fills gap from 5 ms earlier
> 28 ms 5792:7240
> 54 ms 8688:10136
> 55 ms 10136:11584
> 57 ms 7240:8688 <-- late
> 107 ms 26064:27512
> 107 ms 28960:30408
> 108 ms 30408:31856
> 109 ms 27512:28960 <-- late
> 156 ms 57920:59368
> 156 ms 59368:60816
> 157 ms 56472:57920 <-- late
> ```
>
> Fingerprint: A-MPDU subframe lost on first transmission, retried, retry
> arrives 2-5 ms later. Working REO HW would buffer the continuation
> until the missing subframe arrived or the per-TID reorder timeout
> (`HAL_DEFAULT_REO_TIMEOUT_USEC`, 40 ms) expired. Here both continuation
> and retry pass through unordered.
>
> ## Diagnosis
>
> - Air link healthy; host data path clean; REO HW error counters all
> zero — REO simply isn't enforcing order for this peer's TIDs.
> - dmesg across 3 days of suspend cycles shows zero ath11k re-init
> activity (no `fw_version` reprint, no `wcn6855 hw2.1` reprint). The
> firmware instance is the same one from the most recent `modprobe`.
> `ath11k_core_suspend_wow` / `ath11k_core_resume_wow` neither power
> down the device nor re-initialise REO.
> - `rmmod` triggers full `ath11k_hif_power_down` + chip re-init on next
> `modprobe`, which re-runs `ath11k_hw_wcn6855_reo_setup`. This is the
> only reliable recovery, so the corrupted state lives in firmware /
> REO HW that the WOW resume path never touches.
>
> The non-WOW path (`ath11k_core_suspend_default`) does power-down + full
> re-init on resume, re-running `ath11k_dp_srng_common_setup()` →
> `hw_ops->reo_setup()`. The WOW path does not.
>
> ## Related
>
> - Bug #219196 — unexpected wakeups; the WOW workaround was added to
> mitigate this
> - `ce8669a27016` — introduced the WOW quirk table (2025-03-28, Baochen Qiang)
> - `0eb002c93c3b` — added `21K5` / `21K6` (this laptop) to the quirk
> table (2025-09-29, Mark Pearson)
> - `4015b1972763` — adds Z13/Z16 Gen1 to WOW quirk (Nov 2025)
>
> Hauke
>
can you please try below fix ?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a2451a34afdf563b3102d36a4b6cf335cf813e2
More information about the ath11k
mailing list