problem with default local port(nl_pid) when netlink used both via libnl and directly in same application

Brett Ciphery brett.ciphery at windriver.com
Mon May 7 10:39:11 EDT 2012


[Re: problem with default local port(nl_pid) when netlink used both via libnl and directly in same application] On 07/05/2012 (Mon 08:53) Thomas Graf wrote:

> On Mon, May 07, 2012 at 05:05:32AM -0400, Laine Stump wrote:
> > I've just diagnosed a problem in libvirt that traces back to libnl's
> > unilateral decision to use getpid() of the calling process as the
> > default "local port" (nl_pid) for the first netlink socket it creates
> > for each process.
> > 
> > The problem is that this is also the default value used it a piece of
> > code running in that process uses direct system calls to create/bind a
> > netlink socket. In our example, this was the result of calling glibc's
> > getaddrinfo() function, so we weren't even aware that it was happening.
> > Even though getaddrinfo() only keeps its netlink socket connected for a
> > short period, if that is running in a separate thread from the thread
> > that calls nl_handle_alloc()/nl_connect(), the result will be that the
> > bind() in nl_connect() fails with EADDRINUSE.
> > 

Hey,

I'll add one more situation where this symptom might pop up...

If nl_socket_alloc() does run first and thus its bind() is successful
with the process pid, if that process then forks it will inherit this
fd.  If later nl_socket_free() is called and subsequently
nl_socket_alloc(), the inherited fd will still be open in the other
process and this will cause nl_socket_alloc() to produce an EADDRINUSE.

A workaround is to close these fd references in the new process but it
would be quite useful if nl_socket_alloc() was more robust -- of course
no easy task given backwards compatibility.

Brett



More information about the libnl mailing list