ath11k: WCN6855 WoWLAN resume leaves RX in unrecoverable reorder state → TCP collapses

Hauke Mehrtens hauke at hauke-m.de
Mon May 25 06:24:00 PDT 2026


I used AI to help me debug this problem.

On Lenovo ThinkPad P14s G4 AMD (QCNFA765 / WCN6855 hw2.1), ~1 in 10
suspend/resume cycles leaves the ath11k RX path delivering MSDUs out of
order (~16% of TCP segments). TCP cwnd stays at 1-3 MSS and goodput
collapses to ~3 Mbit/s; UDP on the same link in the same minute pushes
100+ Mbit/s.

This machine is in the DMI quirk list at
`drivers/net/wireless/ath/ath11k/core.c` that forces `ATH11K_PM_WOW`.
In WOW mode the firmware is kept alive across suspend; the WOW resume
path does not re-initialise REO HW or per-TID BA state.
The PM_WOW quirk was added as a workaround for unexpected-wakeup bug
https://bugzilla.kernel.org/show_bug.cgi?id=219196

## Affected components

- **Driver:** ath11k_pci
- **Chip:** WCN6855 hw2.1 (`17cb:1103`, QCNFA765)
- **Firmware:** 
`WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41`
   (fw_version `0x11088c35`, 2024-04-17)
- **Kernel:** observed on 7.0.9-arch1-1; also present across many
   earlier kernel versions over the past >1 year on this hardware
- **Machine:** Lenovo ThinkPad P14s G4 AMD, DMI `21K5CTO1WW` (matches
   quirk entry "P14s G4 AMD #1" at `core.c:961-966` via `"21K5"`
   substring)

## Reproduce

1. Associate to an HE AP (characterised at 6 GHz, HE-MCS 5/6 NSS 2
    160 MHz, -56 dBm, using MT7915 with OpenWrt 25.12).
2. Suspend, wake, test `iperf3` TCP. Repeat. On average within ~10
    cycles, one resume leaves the link broken.
3. In the broken state: `iw dev wlpXsY link` still reports ~1.3 Gbit/s
    "bitrate". Ping and UDP iperf3 look fine. TCP iperf3 collapses to
    ~3 Mbit/s with cwnd stuck at 1-3 MSS.

## Evidence

### iperf3, same link same minute

```
AP -> STA, UDP -b 200M -l 1400 -t 15:
   sender:   200 Mbit/s, 267876 datagrams
   receiver: 102 Mbit/s, 137290 received, 130585 "lost"
   (not real loss; iperf3 UDP counts out-of-window arrivals as lost)

AP -> STA, TCP -t 15:
   3.43 Mbit/s, 521 retransmits, cwnd 1.41-5.66 KB throughout
```

### UDP run: no real loss anywhere

- `ip -s link` delta: `+267,953 packets`, `0 errors`, `0 dropped`
   (AP sent 267,876).
- `/proc/net/snmp` Udp: `RcvbufErrors 0, InErrors 0`.
- ath11k `pdev_stats` delta: `MSDUs delivered to HTT +267,985`.
- `soc_dp_stats` entirely zero: no RXDMA / REO / HAL / TCL / backpressure
   errors of any kind.
- AP `iw station get`: ~1.3% retry rate, -65 dBm ACK signal,
   `expected throughput 1049 Mbps`.

→ Air link clean. Host data path clean. Firmware delivered every
datagram. No drops anywhere.

### TCP socket reorder (`ss -tin` once per second during TCP iperf3)

```
    t (s)    bytes_rx   segs_in   rcv_ooopack
    0        1,291,653       895          158
    1        1,717,365     1,189          210
    2        2,060,541     1,426          274
    3        2,519,557     1,743          335
    4        3,050,973     2,110          397
    5        3,446,277     2,383          450
    6        3,906,741     2,701          513
```

~60 ooo packets/s out of ~370 segs/s = **~16% out-of-order**, sustained.

### Packet-level pattern (`tcpdump` on wlpXsY)

Seq normalised to 0 at flow start:

```
22 ms     2896:4344
25 ms     4344:5792
27 ms     1448:2896           <-- late; fills gap from 5 ms earlier
28 ms     5792:7240
54 ms     8688:10136
55 ms     10136:11584
57 ms     7240:8688           <-- late
107 ms    26064:27512
107 ms    28960:30408
108 ms    30408:31856
109 ms    27512:28960         <-- late
156 ms    57920:59368
156 ms    59368:60816
157 ms    56472:57920         <-- late
```

Fingerprint: A-MPDU subframe lost on first transmission, retried, retry
arrives 2-5 ms later. Working REO HW would buffer the continuation
until the missing subframe arrived or the per-TID reorder timeout
(`HAL_DEFAULT_REO_TIMEOUT_USEC`, 40 ms) expired. Here both continuation
and retry pass through unordered.

## Diagnosis

- Air link healthy; host data path clean; REO HW error counters all
   zero — REO simply isn't enforcing order for this peer's TIDs.
- dmesg across 3 days of suspend cycles shows zero ath11k re-init
   activity (no `fw_version` reprint, no `wcn6855 hw2.1` reprint). The
   firmware instance is the same one from the most recent `modprobe`.
   `ath11k_core_suspend_wow` / `ath11k_core_resume_wow` neither power
   down the device nor re-initialise REO.
- `rmmod` triggers full `ath11k_hif_power_down` + chip re-init on next
   `modprobe`, which re-runs `ath11k_hw_wcn6855_reo_setup`. This is the
   only reliable recovery, so the corrupted state lives in firmware /
   REO HW that the WOW resume path never touches.

The non-WOW path (`ath11k_core_suspend_default`) does power-down + full
re-init on resume, re-running `ath11k_dp_srng_common_setup()` →
`hw_ops->reo_setup()`. The WOW path does not.

## Related

- Bug #219196 — unexpected wakeups; the WOW workaround was added to
   mitigate this
- `ce8669a27016` — introduced the WOW quirk table (2025-03-28, Baochen 
Qiang)
- `0eb002c93c3b` — added `21K5` / `21K6` (this laptop) to the quirk
   table (2025-09-29, Mark Pearson)
- `4015b1972763` — adds Z13/Z16 Gen1 to WOW quirk (Nov 2025)

Hauke



More information about the ath11k mailing list