rxrpc kernel sockets hold additional reference to dst

David Howells dhowells at redhat.com
Thu Jan 28 16:11:48 EST 2021


Vadim Fedorenko <vfedorenko at novek.ru> wrote:

> On 28.01.2021 13:44, David Howells wrote:
> > Vadim Fedorenko <vfedorenko at novek.ru> wrote:
> > 
> >> Oh, I see, tun_get_user sets skb->sk while building skb. That's why udp_rcv
> >> sets sock->sk_rx_dst. I think we should clean skb->sk explicitly somewhere in
> >> receive path?
> > Whose sock->sk_rx_dst?  How does it get set on an rxrpc socket, since udp
> > can't see it?
> > Does a function need to be called by rxrpc to drop any IP/UDP wibbly bits
> > attached to the socket buffer it just got?
> 
> I think I finally found what happens. rxrpc_open_socket() registers kernel udp
> socket with encap_type = UDP_ENCAP_RXRPC. kernel_bind() adds this socket to 
> hashtable udp_table.hash2. But because this socket is configured with addr = 0
> and port = 0 function __udp4_lib_demux_lookup founds it as a socket for any 
> broadcast udp packet with dport == 0.

So afs_open_socket() tries to bind a socket to port AFS_CM_PORT (7001),
address 0 initially, but if that fails, it will try instead port 0, address 0,
in an attempt to ask the UDP socket to select a port.

This is most likely to happen if a new net namespace gets created since
there's normally one port used per namespace - userspace tools, however, may
also use an AF_RXRPC socket, though they wouldn't normally bind it (though a
server would).

> So if udp_early_demux is set, ip_rcv_finish_core calls for
> __udp4_lib_demux_lookup which finds rxrpc kernel socket with encap_rcv handler 
> set, and saves this socket to skb->sk. After that in __udp4_lib_rcv
> skb_steal_sock completes successfully and skb->dst is assigned to
> sk->sk_rx_dst with additional reference taken which is never released and
> leaked after rxrpc socket is destroyed. I have added some debug messages to
> illustrate this condition:
> [  752.485217] rxrpc: rxrpc_open_socket: sk 0000000097ac0a9a net 0000000006d7e0b0
> [  752.668254] __netif_receive_skb_core out: skb 0000000047a81488 skb->sk
> 0000000000b4bc01
> [  752.668266] IPv4: ip_rcv: skb 0000000047a81488 skb->sk 0000000000b4bc01
> [  752.668278] IPv4: ip_rcv out: skb 0000000047a81488 skb->sk 0000000000000000
> [  752.668372] IPv4: ip_rcv_finish: skb 0000000047a81488 skb->sk 0000000000000000
> [  752.668384] IPv4: ip_rcv_finish after l3mdev: skb 0000000047a81488 skb->sk
> 0000000000000000
> [  752.668395] IPv4: ip_rcv_finish_core: skb 0000000047a81488 skb->sk
> 0000000000000000
> [  752.668406] IPv4: ip_rcv_finish_core after use hint: skb 0000000047a81488
> skb->sk 0000000000000000
> [  752.668417] IPv4: ip_rcv_finish_core in early demux: skb 0000000047a81488
> skb->sk 0000000000000000
> [  752.668438] UDP: __udp4_lib_demux_lookup: lport 22811, dport 0, laddr 0,
> raddr 0, net 0000000006d7e0b0
> [  752.668452] IPv4: ip_rcv_finish_core before ip route input: skb
> 0000000047a81488 skb->sk 0000000097ac0a9a
> [  752.668475] IPv4: ip_route_input_slow: skb: 0000000047a81488 skb->sk:
> 0000000097ac0a9a, net 0000000006d7e0b0
> [  752.668502] IPv4: ip_route_input_slow before dst_alloc: skb:
> 0000000047a81488 skb->sk: 0000000097ac0a9a, net 0000000006d7e0b0
> [  752.668532] IPv4: ip_route_input_slow: dev: syz_tun net ns: 0000000006d7e0b0
> [  752.668543] IPv4: ip_route_input_slow: rt_dev: lo net ns: 0000000006d7e0b0
> [  752.668595] IPv4: ip_route_input_slow after dst_alloc: skb:
> 0000000047a81488 skb->sk: 0000000097ac0a9a
> [  752.668607] IPv4: ip_rcv_finish out: skb 0000000047a81488 skb->sk
> 0000000097ac0a9a
> [  752.668675] UDP: __udp4_lib_rcv: sk 0000000097ac0a9a, sock_net
> 0000000006d7e0b0, dst->net 0000000006d7e0b0
> [  752.669378] dst_release: dst: 00000000f5e1b944 net ns: 0000000006d7e0b0
> 
> I'm not sure how to deal with early demux in this case. I think we should add
> more restrictions to  __udp4_lib_demux_lookup but I'm not sure which exactly.
> Do you have any suggestions?

Is there a way to allocate an unused port from UDP?

Alternatively, should rxrpc discard the dst attached to the skb?  Having it
available could be useful.  One thing I'd like to do at some point is cache
the dst in an rxrpc_connection struct so that I can skip the address lookup
when doing sendmsg.

David




More information about the linux-afs mailing list