Netlink sockets and concurrency

Matt Layher mdlayher at gmail.com
Thu Feb 23 10:01:19 PST 2017


Thanks for the reply.  Yeah, I actually thought about goroutines not 
mapping to threads right after I sent this, and I tried using 
runtime.LockOSThread and runtime.UnlockOSThread immediately when a 
goroutine spun up.

Still encountered the same problem that way though, sadly.  I'll check 
out your link now, thanks!

Also worth noting that I went ahead and tried an actual test with 
genetlink: same scenario, but looking up family information for nlctrl.  
Let that run in a loop for 10 minutes, and then 'go test' sent SIGQUIT 
since it ran too long.  No crashes there.  I'm curious if something 
about my "synthetic" test was making it act up.

I'll keep looking into it.  Thanks again for the reply.

- Matt


On 02/23/2017 12:46 PM, Dan Williams wrote:
> On Thu, 2017-02-23 at 10:38 -0500, Matt Layher wrote:
>> Hi all,
>>
>> This question isn't directly related to libnl, but rather to netlink
>> and
>> netlink sockets themselves. I wasn't sure where else to ask, but I
>> figured the folks on this list should have some good experience
>> working
>> with netlink.
>>
>> I built a Go package (https://github.com/mdlayher/netlink) for
>> working
>> with netlink sockets, but am seeing some occasional strange behavior
>> when attempting to use multiple sockets from the same application.
>>
>> For whatever reason, netlink appears to occasionally send a reply
>> message to the wrong socket, when being called concurrently.  I'm
>> opening 16 genetlink sockets and giving each socket its own "thread"
>> ("goroutine" in Go).  I pick a sequence number at random for each
>> socket, and then increment it each time a message is sent.
> Be careful with concurrency, Go, and system calls.
>
> Go's concurrency model is not a strict 1:1 mapping between goroutines
> and OS threads.  The Go scheduler will often mix and match goroutines
> between OS threads on the fly, and you can never guarantee which
> goroutine is running on which OS thread, even during the life of the
> goroutine.
>
> So don't assume that a goroutine will run on any specific OS thread at
> any point.  Unless...
>
> You can use the LockOSThread()/UnlockOSThread() to ensure that a single
> goroutine is the only one on a given OS thread for its lifetime, and
> that no other goroutines will run on that OS thread.  This of course
> kills parallelism since the Go scheduler can't run anything else in
> that OS thread.
>
> For more somewhat related info, see:
> https://github.com/containernetworking/cni/tree/master/pkg/ns
>
> I'm not sure why this might cause problems, but you mention threads and
> goroutines and that's a trigger :)
> ---
>
> Anyway, it looks like you're letting the kernel allocate nl_pid.  As a
> test, what if you create a unique nl_pid for each Conn object before
> you bind it, to take the kernel out of the loop for debugging purposes?
>
> Dan
>
>> At this point, I send 10,000 messages from each socket with the
>> flags
>> "request + acknowledge", so netlink will echo back the message I sent
>> to
>> it.  Again, before each message is sent, I increment the internal
>> sequence number of my socket wrapper.
>>
>> For whatever reason, sometimes I receive a reply back from netlink
>> with
>> an unexpected sequence number.  The sequence number often looks like
>> it
>> was meant for another socket in the test, running in a different
>> thread.
>>
>> Is it safe to open multiple sockets to netlink (genetlink,
>> specifically)
>> in the same application and use them concurrently in this way? As far
>> as
>> I can tell, my code is free of race conditions in user-space
>> (verified
>> using Go's race detector).  I am not sharing a single socket between
>> multiple threads.  I am simply sending and receiving on multiple
>> sockets
>> at the same time, in independent threads.
>>
>> It doesn't appear that libnl has any special "global lock", other
>> than
>> the PID assignment map.  I am no C expert, but I'm curious if there
>> is a
>> workaround in libnl for making use of multiple sockets concurrently,
>> to
>> ensure that messages are delivered properly to the expected socket.
>>
>> Thanks for your time.  I'd certainly appreciate any insight you all
>> may
>> have on this matter.
>>
>> - Matt Layher
>>
>> _______________________________________________
>> libnl mailing list
>> libnl at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/libnl




More information about the libnl mailing list