[FS#1426] Probable inet6_dev refcount leak introduced by OpenWrt-specific patch

LEDE Bugs lede-bugs at lists.infradead.org
Sat Mar 10 14:32:38 PST 2018


A new Flyspray task has been opened.  Details are below. 

User who did this - Gwani (Gwani) 

Attached to Project - OpenWrt/LEDE Project
Summary - Probable inet6_dev refcount leak introduced by OpenWrt-specific patch
Task Type - Bug Report
Category - Base system
Status - Unconfirmed
Assigned To - 
Operating System - All
Severity - Low
Priority - Very Low
Reported Version - All
Due in Version - Undecided
Due Date - Undecided
Details - 
====System:====
    * **Device:** Linksys WRT3200ACM //(**Note:** problem is most likely not device-specific)//
    * **OpenWrt/LEDE:** lede-17.01 and Git commit 359273d7f6e5733b84a263f8d3023e9d4adc7d40 (both tested)
    * **Kernel:** Custom built from https://github.com/openwrt/openwrt.git both 4.9 and 4.14 versions
====Problem:====
When deleting a kernel network namespace, a kworker thread hangs indefinitely waiting for the loopback device inside the namespace to be released. This prevents the creation of any additional network namespaces until the system is rebooted. It affects any software utilizing network namespaces such as LXC (which i was experimenting with when i first encountered this problem). LXC containers could only be started once and would hang when trying to restart them or start another container after one container had been stopped.
====Steps to reproduce:====
    - Build kernel with namespace support including network namespaces:CONFIG_KERNEL_NAMESPACES=y
CONFIG_KERNEL_UTS_NS=y
CONFIG_KERNEL_IPC_NS=y
CONFIG_KERNEL_USER_NS=y
CONFIG_KERNEL_PID_NS=y
CONFIG_KERNEL_NET_NS=y
    - Build BusyBox with the **unshare** utility (Linux System Utilities)CONFIG_BUSYBOX_CONFIG_UNSHARE=y
    - Boot into system, create and immediately delete a network namespace with:root at box:~# unshare -n true
====Observed symptoms:====

  * root at box:~# ps | grep kworker shows a kworker-thread lingering in D-state: 450 root         0 DW   [kworker/u4:3]
  * after a while, root at box:~# dmesg shows [  114.596437] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  124.728977] unregister_netdevice: waiting for lo to become free. Usage count = 1
[  134.881391] unregister_netdevice: waiting for lo to become free. Usage count = 1
. The message is repeated indefinitely every 10 seconds.
  * No more additional network namespaces can be created, another **unshare -n** will hang indefinitely.
====Possible cause:====
After poking in the dark for quite some time, [[https://forum.turris.cz/t/turris-os-3-9-1-is-out-in-rc-with-a-number-of-fixes/5918/25|i found this post]] by //HomerSp// in the Turris OS. After the kernel developers [[https://bugzilla.kernel.org/show_bug.cgi?id=198189|implied that the problem was due to a patch in OpenWrt]] during his first assessment, he finally tracked it down to a missing **in6_dev_put()** in [[https://github.com/openwrt/openwrt/blob/master/target/linux/generic/pending-4.14/670-ipv6-allow-rejecting-with-source-address-failed-policy.patch|670-ipv6-allow-rejecting-with-source-address-failed-policy.patch]]

====Solution:====
This is my patch derived from ([[https://gist.github.com/HomerSp/8ed5d5b7dcd4175a2fa3351577416a1b|HomerSp's complete modified patch here]]) which i applied to my kernel code after all other patches:
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3860,6 +3860,7 @@ static int ip6_route_dev_notify(struct notifier_block *this,
 		in6_dev_put_clear(&net->ipv6.ip6_null_entry->rt6i_idev);
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 		in6_dev_put_clear(&net->ipv6.ip6_prohibit_entry->rt6i_idev);
+		in6_dev_put(net->ipv6.ip6_policy_failed_entry->rt6i_idev);
 		in6_dev_put_clear(&net->ipv6.ip6_blk_hole_entry->rt6i_idev);
 #endif
 	}
//**Note:** line numbers probably not matching since my repo also contains some other recent upstream patches i tested while trying to solve the problem. You will probably want to fix the patch file itself instead of "patching after the patch" like i did.//

Since the solution doesn't seem to have made in back into the OpenWrt code, i took the liberty of reporting it here and bring the problem to your attention. Although __i can't say anything about its correctness__, it solves the problem at least for me.

More information can be found at the following URL:
https://bugs.openwrt.org/index.php?do=details&task_id=1426



More information about the lede-bugs mailing list