rxrpc kernel sockets hold additional reference to dst

Vadim Fedorenko vfedorenko at novek.ru
Wed Jan 27 22:47:01 EST 2021


Hi!
I found a root cause of old syzkaller bug
https://syzkaller.appspot.com/bug?id=949ecf93b67ab1df8f890571d24ef9db50872c96
RXRPC sockets are based on UDP sockets. That's why __udp4_lib_rcv sets
sk->sk_rx_dst taking reference for such sockets, but rxrpc_sock_destructor never 
releases this reference. But simple adding dst_release(sk->sk_rx_dst) to
rxrpc_sock_destructor doesn't help in case when namespace of rxrpc socket is 
going to be destroyed. This happens because the order of ops_free is such that 
netdevices are destroyed before kernel sockets. And there comes deadlock:
rxrpc socket holds a reference to dst_entry which holds reference to the device 
in namespace. So ops_free cannot destroy all the netdevices in namespace, but 
rxrpc socket waits for next ops_free operation which will be executed after 
netdevices destroy.
My solution to change exit operation of rxrpc to pre-exit is not working well,
so I need an advise on how to deal with this deadlock.

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 0a2f481..8f50238 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -833,10 +842,16 @@ static void rxrpc_sock_destructor(struct sock *sk)
         _enter("%p", sk);

         rxrpc_purge_queue(&sk->sk_receive_queue);
+       dst_release(sk->sk_rx_dst);

         WARN_ON(refcount_read(&sk->sk_wmem_alloc));
         WARN_ON(!sk_unhashed(sk));
diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c
index 25bbc4c..9284d82 100644
--- a/net/rxrpc/net_ns.c
+++ b/net/rxrpc/net_ns.c
@@ -108,10 +108,11 @@ static __net_init int rxrpc_init_net(struct net *net)
  /*
   * Clean up a per-network namespace record.
   */
-static __net_exit void rxrpc_exit_net(struct net *net)
+static __net_exit void rxrpc_pre_exit_net(struct net *net)
  {
         struct rxrpc_net *rxnet = rxrpc_net(net);

@@ -124,7 +125,7 @@ static __net_exit void rxrpc_exit_net(struct net *net)

  struct pernet_operations rxrpc_net_ops = {
         .init   = rxrpc_init_net,
-       .exit   = rxrpc_exit_net,
+       .pre_exit       = rxrpc_pre_exit_net,
         .id     = &rxrpc_net_id,
         .size   = sizeof(struct rxrpc_net),
  };



More information about the linux-afs mailing list