[RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1
Lukasz Raczylo
lukasz at raczylo.com
Sat Apr 25 14:48:25 PDT 2026
A follow-up runtime data point on this series.
Fleet state at 2026-04-25 21:46 UTC:
* Patched uptime (since staggered rollout 2026-04-24 18:10-19:20 UTC):
- shortest: 26h 26m (last master upgraded)
- longest: 27h 34m (canary)
- cumulative across 24 nodes: ~651 node-hours
* Macb-attributable event counts (out-of-band userspace watchdog;
the [tx-stall] detector watches /sys/class/net/end0/statistics/
tx_packets + qdisc backlog every 1 s and would have fired
ip link down/up if any node's TX path froze):
- RECOVER trigger=tx-stall (actual stalls caught): 0
- partial [tx-stall] markers (transient 1 s freezes): 0
* Separately: 40 RECOVER events with trigger=ping fired in this
window across the fleet, attributable to a brief upstream-network
outage (gateway / switch event); each node simultaneously lost ping
to gateway, VIP, and NAS within seconds of each other, then
recovered. These are unrelated to the macb hang the patch series
targets — distinguishing them from a real TX stall is exactly what
the trigger= tag in the watchdog log is for.
At the pre-patch rate referenced in the cover letter (50 stalls in
95 node-hours observed in our 2026-04-24 14:00-18:10 UTC reference
window, ~0.5 per node-hour), the projected stall count in
651 node-hours is on the order of 342;
observed is 0.
Same observability runs forward; will reply again after a full week
of uptime unless something changes.
--
Lukasz Raczylo
More information about the linux-arm-kernel
mailing list