[FS#1544] High load on Ubiquiti Nanostation XM - maybe related to "workingset_refault" (vmstat)

LEDE Bugs lede-bugs at lists.infradead.org
Fri May 11 12:26:26 PDT 2018


A new Flyspray task has been opened.  Details are below. 

User who did this - Lars (sumpfralle) 

Attached to Project - OpenWrt/LEDE Project
Summary - High load on Ubiquiti Nanostation XM - maybe related to "workingset_refault" (vmstat)
Task Type - Bug Report
Category - Base system
Status - Unconfirmed
Assigned To - 
Operating System - All
Severity - Low
Priority - Very Low
Reported Version - Trunk
Due in Version - Undecided
Due Date - Undecided
Details - Our local wireless community uses a lot of Ubiquiti devices.

They all worked well with Chaos Calmer.

With LEDE 17.01 we started to see load issues with Nanostation M5 XM devices (the older Nanostation model, only 32 MB). We did not notice the issue with any other device up to now.

After a few hours of uptime the routers will start to develop persistent high load (>8) and usually "recover" only after a reboot. "wifi up/down" do not seem to affect the issue.

The problem is almost non-existing for devices using only a single ethernet port. Devices using both ethernet ports suffer greatly (problems starting usually within 24 hours). Thus I could imagine, that [[https://bugs.openwrt.org/index.php?do=details&task_id=296|issue #296]] is related (just wild guessing).

Traffic on the wireless interface seems to increase the likelyhood of the problem (maybe CPU utilization in general).

"top" and other tools do not show processes, that could cause the high load.

The only unusual metric that seems to be connected to the high-load situation seems to be "workingset_refault" (see /proc/vmstat).
See the following output:

root at AP-1-96:~# while sleep 10; do grep workingset_ /proc/vmstat; done
workingset_refault 1304983
workingset_activate 392198
workingset_nodereclaim 10330
workingset_refault 1308585
workingset_activate 393391
workingset_nodereclaim 10352
workingset_refault 1308671
workingset_activate 393412
workingset_nodereclaim 10352
workingset_refault 1310284
workingset_activate 393940
workingset_nodereclaim 10374
workingset_refault 1317360
workingset_activate 396226
workingset_nodereclaim 10454
workingset_refault 1317465
workingset_activate 396251
workingset_nodereclaim 10454
workingset_refault 1317540
workingset_activate 396292
workingset_nodereclaim 10454
workingset_refault 1324449
workingset_activate 398402
workingset_nodereclaim 10508
workingset_refault 1328418
workingset_activate 399908
workingset_nodereclaim 10536
workingset_refault 1328796
workingset_activate 400114
workingset_nodereclaim 10536
workingset_refault 1329186
workingset_activate 400213
workingset_nodereclaim 10546
workingset_refault 1333889
workingset_activate 401528
workingset_nodereclaim 10594


Above you see 13k "workingset_refault" events within 60 seconds. The "workingset_refault" value stays at zero for routers with the same kernel, that do now show this problem. Thus I could imagine, that this is related to the high load.

Now I am running out of ideas, how to research the issue. Maybe someone can give me a hint, what I could try?

Just for reference: we are also discussing this issue in the bug tracker of our local wireless community (https://dev.opennet-initiative.de/ticket/187 - only in German). But this discussion may be a bit hard to read, as we were hunting down different potential causes of the problem. But sadly each of our theories dissolved without giving a hint for the root cause.

More information can be found at the following URL:
https://bugs.openwrt.org/index.php?do=details&task_id=1544



More information about the lede-bugs mailing list