nl_connect EADDRINUSE error after child is forked

Thu Jul 31 16:00:14 PDT 2014

Hello,
I am using:
libnl version 3.2.16Linux kernel version 3.0.34
I have encountered a problem where the parent process of a program that forks a child will sometimes become unable to connect a new NETLINK socket. More specifically, after the child is forked, nl_connect() fails with errno set as EADDRINUSE.
This will only happen when libnl chooses the PID for the socket (i.e., when libnl internally uses generate_local_port() to pick the PID).
The sequence is as follows.
Parent: open 1st socket (libnl marks "port" in-use)
Parent: fork (child inherits the 1st socket)
Parent: close 1st socket (libnl marks port available)
Parent: open 2nd socket <-- This will fail
Child:  exec (close-on-exec releases the 1st socket)
Parent: open 3rd socket <-- This will succeed
The sample program at the end of this message shows the problem.
I suspect that this problem occurs because libnl is unaware that the 1st socket remains open after the parent calls nl_socket_free(), which in turn calls close(), due to the socket/fd having been inherited by the child process.
libnl ends up prematurely marking the port as free and tries to re-use that port in subsequent nl_socket_alloc() and nl_connect() calls.
Since the Linux kernel checks for NETLINK socket PID collisions (see netlink_insert() in net/netlink/af_netlink.c), nl_connect() fails with errno set to EADDRINUSE until the child calls exec() and close-on-exec causes the first socket to finally really get released.
[Although the child in the sample program calls exec(), clearly there's no guarantee that would happen in other programs.]
It seems like the problematic aspect here may be that libnl is trying to keep track of the available "ports" based on successful socket() and close() calls but port (really, NETLINK socket PID) availability can only be accurately determined in the kernel based on when sockets actually get added or removed from the kernel's NETLINK socket hash table.
In other words, although there is no guarantee that any given socket will get released when libnl calls close(), libnl acts as if its close() calls will always lead to the socket being released so it is safe to return the associated port to the pool of available ports.
Assuming the above is correct, I'm not really sure how this would best be solved but here are a couple of possibilities.
1a) Remove port selection support from libnl and use the auto-bind function provided by the kernel instead. Note that the kernel's auto-bind PID selection algorithm differs from the port scheme libnl uses so this could cause trouble for existing implementations that are relying on the PID libnl generates today.
1b) Remove port selection support from libnl and, for backwards compatibility, conditionally (via a new socket option?) have the kernel's autobind function apply the same port scheme that libnl has been using. This would entail modifying the kernel.
2) Implement a "best effort" auto-bind in libnl.
Basically, instead of giving up when socket() returns EADDRINUSE on the first available port, try each port marked in the pool as available until the socket() call succeeds or an error other than EADDRINUSE is returned. Since libnl will not receive notification when the socket is closed through some external means (such as close-on-exec or a diret invocation of close() on the fd representing the socket), libnl would need to avoid keying off of EADDRINUSE to mark any port as in-use.
Hopefully, someone else will have a better suggestion to offer.
Thank you,
- Andrew
#include <unistd.h>#include <errno.h>#include <sys/types.h>#include <sys/wait.h>
/* libnl includes. */#include <netlink/netlink.h>

int main( int argc, char **argv ){
    struct nl_sock*     nlsk = NULL;    int                 status;    pid_t               pid;

    /* Open a socket _before_ forking a child. */    nlsk = nl_socket_alloc();    if( !nlsk ){
        printf( "Socket #1 allocation failed: %m\n" );        exit( 1 );    }    if( nl_connect( nlsk, NETLINK_ROUTE ) != 0 ){
        printf( "Socket #1 open failed: %m\n" );        exit( 1 );    }
    printf( "Socket #1 OK, forking child\n" );
    pid = fork();    if( pid < 0 ){
        printf( "Fork failed: %m\n" );        exit( 1 );    }
    if( 0 == pid ){ /* Child */
        /*         * Wait 3 seconds and then exec. Within the 3 second delay, the NETLINK         * socket will remain open even after nl_socket_free() is called by the         * parent. The socket will only get released when the child calls exec         * as that triggers close-on-exec.         */        sleep( 3 );        execlp( "echo", "echo", "Child execs (socket #1 is now really closed)", NULL );        printf( "Child execlp failed: %m\n" );        exit( 1 );    }
    /* Parent */    /*     * Call close _after_ the child has been forked. nl_socket_free() internally     * calls close()).     */    nl_socket_free( nlsk );

    /* Try to connect a new NETLINK socket. This will fail with EADDRINUSE. */    nlsk = nl_socket_alloc();    if( !nlsk ){
        printf( "Socket #2 allocation failed: %m\n" );        exit( 1 );    }    if( nl_connect( nlsk, NETLINK_ROUTE ) == 0 ){
        printf( "Socket #2 open unexpectedly succeeded\n" );        exit( 1 );    }    if( errno != EADDRINUSE ){
        printf( "Socket #2 open failed with unexpected error: %m\n" );        exit( 1 );    }    nl_socket_free( nlsk );
    printf( "Socket #2 OK (open failed with EADDRINUSE), waiting for child to exit\n" );
    /* Wait for the child to exec and exit. */    waitpid( pid, &status, 0 );
    /* Since the child has exec'ed, nl_connect() should now succeed. */    nlsk = nl_socket_alloc();    if( !nlsk ){
        printf( "Socket #3 allocation failed: %m\n" );        exit( 1 );    }    if( nl_connect( nlsk, NETLINK_ROUTE ) != 0 ){
        printf( "Socket #3 open failed: %m\n" );        exit( 1 );    }    nl_socket_free( nlsk );    printf( "Socket #3 OK\n" );    exit( 0 );}