Re: ath11k: WCN6855 WoWLAN resume leaves RX in unrecoverable reorder state → TCP collapses

Baochen Qiang baochen.qiang at oss.qualcomm.com
Mon May 25 18:51:06 PDT 2026



On 5/25/2026 9:24 PM, Hauke Mehrtens wrote:
> I used AI to help me debug this problem.
> 
> On Lenovo ThinkPad P14s G4 AMD (QCNFA765 / WCN6855 hw2.1), ~1 in 10
> suspend/resume cycles leaves the ath11k RX path delivering MSDUs out of
> order (~16% of TCP segments). TCP cwnd stays at 1-3 MSS and goodput
> collapses to ~3 Mbit/s; UDP on the same link in the same minute pushes
> 100+ Mbit/s.
> 
> This machine is in the DMI quirk list at
> `drivers/net/wireless/ath/ath11k/core.c` that forces `ATH11K_PM_WOW`.
> In WOW mode the firmware is kept alive across suspend; the WOW resume
> path does not re-initialise REO HW or per-TID BA state.
> The PM_WOW quirk was added as a workaround for unexpected-wakeup bug
> https://bugzilla.kernel.org/show_bug.cgi?id=219196
> 
> ## Affected components
> 
> - **Driver:** ath11k_pci
> - **Chip:** WCN6855 hw2.1 (`17cb:1103`, QCNFA765)
> - **Firmware:** `WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41`
>   (fw_version `0x11088c35`, 2024-04-17)
> - **Kernel:** observed on 7.0.9-arch1-1; also present across many
>   earlier kernel versions over the past >1 year on this hardware
> - **Machine:** Lenovo ThinkPad P14s G4 AMD, DMI `21K5CTO1WW` (matches
>   quirk entry "P14s G4 AMD #1" at `core.c:961-966` via `"21K5"`
>   substring)
> 
> ## Reproduce
> 
> 1. Associate to an HE AP (characterised at 6 GHz, HE-MCS 5/6 NSS 2
>    160 MHz, -56 dBm, using MT7915 with OpenWrt 25.12).
> 2. Suspend, wake, test `iperf3` TCP. Repeat. On average within ~10
>    cycles, one resume leaves the link broken.
> 3. In the broken state: `iw dev wlpXsY link` still reports ~1.3 Gbit/s
>    "bitrate". Ping and UDP iperf3 look fine. TCP iperf3 collapses to
>    ~3 Mbit/s with cwnd stuck at 1-3 MSS.
> 
> ## Evidence
> 
> ### iperf3, same link same minute
> 
> ```
> AP -> STA, UDP -b 200M -l 1400 -t 15:
>   sender:   200 Mbit/s, 267876 datagrams
>   receiver: 102 Mbit/s, 137290 received, 130585 "lost"
>   (not real loss; iperf3 UDP counts out-of-window arrivals as lost)
> 
> AP -> STA, TCP -t 15:
>   3.43 Mbit/s, 521 retransmits, cwnd 1.41-5.66 KB throughout
> ```
> 
> ### UDP run: no real loss anywhere
> 
> - `ip -s link` delta: `+267,953 packets`, `0 errors`, `0 dropped`
>   (AP sent 267,876).
> - `/proc/net/snmp` Udp: `RcvbufErrors 0, InErrors 0`.
> - ath11k `pdev_stats` delta: `MSDUs delivered to HTT +267,985`.
> - `soc_dp_stats` entirely zero: no RXDMA / REO / HAL / TCL / backpressure
>   errors of any kind.
> - AP `iw station get`: ~1.3% retry rate, -65 dBm ACK signal,
>   `expected throughput 1049 Mbps`.
> 
> → Air link clean. Host data path clean. Firmware delivered every
> datagram. No drops anywhere.
> 
> ### TCP socket reorder (`ss -tin` once per second during TCP iperf3)
> 
> ```
>    t (s)    bytes_rx   segs_in   rcv_ooopack
>    0        1,291,653       895          158
>    1        1,717,365     1,189          210
>    2        2,060,541     1,426          274
>    3        2,519,557     1,743          335
>    4        3,050,973     2,110          397
>    5        3,446,277     2,383          450
>    6        3,906,741     2,701          513
> ```
> 
> ~60 ooo packets/s out of ~370 segs/s = **~16% out-of-order**, sustained.
> 
> ### Packet-level pattern (`tcpdump` on wlpXsY)
> 
> Seq normalised to 0 at flow start:
> 
> ```
> 22 ms     2896:4344
> 25 ms     4344:5792
> 27 ms     1448:2896           <-- late; fills gap from 5 ms earlier
> 28 ms     5792:7240
> 54 ms     8688:10136
> 55 ms     10136:11584
> 57 ms     7240:8688           <-- late
> 107 ms    26064:27512
> 107 ms    28960:30408
> 108 ms    30408:31856
> 109 ms    27512:28960         <-- late
> 156 ms    57920:59368
> 156 ms    59368:60816
> 157 ms    56472:57920         <-- late
> ```
> 
> Fingerprint: A-MPDU subframe lost on first transmission, retried, retry
> arrives 2-5 ms later. Working REO HW would buffer the continuation
> until the missing subframe arrived or the per-TID reorder timeout
> (`HAL_DEFAULT_REO_TIMEOUT_USEC`, 40 ms) expired. Here both continuation
> and retry pass through unordered.
> 
> ## Diagnosis
> 
> - Air link healthy; host data path clean; REO HW error counters all
>   zero — REO simply isn't enforcing order for this peer's TIDs.
> - dmesg across 3 days of suspend cycles shows zero ath11k re-init
>   activity (no `fw_version` reprint, no `wcn6855 hw2.1` reprint). The
>   firmware instance is the same one from the most recent `modprobe`.
>   `ath11k_core_suspend_wow` / `ath11k_core_resume_wow` neither power
>   down the device nor re-initialise REO.
> - `rmmod` triggers full `ath11k_hif_power_down` + chip re-init on next
>   `modprobe`, which re-runs `ath11k_hw_wcn6855_reo_setup`. This is the
>   only reliable recovery, so the corrupted state lives in firmware /
>   REO HW that the WOW resume path never touches.
> 
> The non-WOW path (`ath11k_core_suspend_default`) does power-down + full
> re-init on resume, re-running `ath11k_dp_srng_common_setup()` →
> `hw_ops->reo_setup()`. The WOW path does not.
> 
> ## Related
> 
> - Bug #219196 — unexpected wakeups; the WOW workaround was added to
>   mitigate this
> - `ce8669a27016` — introduced the WOW quirk table (2025-03-28, Baochen Qiang)
> - `0eb002c93c3b` — added `21K5` / `21K6` (this laptop) to the quirk
>   table (2025-09-29, Mark Pearson)
> - `4015b1972763` — adds Z13/Z16 Gen1 to WOW quirk (Nov 2025)
> 
> Hauke
> 

can you please try below fix ?

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a2451a34afdf563b3102d36a4b6cf335cf813e2




More information about the ath11k mailing list