rxrpc kernel sockets hold additional reference to dst

Vadim Fedorenko vfedorenko at novek.ru
Thu Jan 28 15:13:05 EST 2021


On 28.01.2021 13:44, David Howells wrote:
> Vadim Fedorenko <vfedorenko at novek.ru> wrote:
> 
>> Oh, I see, tun_get_user sets skb->sk while building skb. That's why udp_rcv
>> sets sock->sk_rx_dst. I think we should clean skb->sk explicitly somewhere in
>> receive path?
> 
> Whose sock->sk_rx_dst?  How does it get set on an rxrpc socket, since udp
> can't see it?
> 
> Does a function need to be called by rxrpc to drop any IP/UDP wibbly bits
> attached to the socket buffer it just got?
> 
> David
> 

I think I finally found what happens. rxrpc_open_socket() registers kernel udp 
socket with encap_type = UDP_ENCAP_RXRPC. kernel_bind() adds this socket to 
hashtable udp_table.hash2. But because this socket is configured with addr = 0 
and port = 0 function __udp4_lib_demux_lookup founds it as a socket for any 
broadcast udp packet with dport == 0.
So if udp_early_demux is set, ip_rcv_finish_core calls for 
__udp4_lib_demux_lookup which finds rxrpc kernel socket with encap_rcv handler 
set, and saves this socket to skb->sk. After that in __udp4_lib_rcv 
skb_steal_sock completes successfully and skb->dst is assigned to sk->sk_rx_dst 
with additional reference taken which is never released and leaked after rxrpc 
socket is destroyed. I have added some debug messages to illustrate this condition:
[  752.485217] rxrpc: rxrpc_open_socket: sk 0000000097ac0a9a net 0000000006d7e0b0
[  752.668254] __netif_receive_skb_core out: skb 0000000047a81488 skb->sk 
0000000000b4bc01
[  752.668266] IPv4: ip_rcv: skb 0000000047a81488 skb->sk 0000000000b4bc01
[  752.668278] IPv4: ip_rcv out: skb 0000000047a81488 skb->sk 0000000000000000
[  752.668372] IPv4: ip_rcv_finish: skb 0000000047a81488 skb->sk 0000000000000000
[  752.668384] IPv4: ip_rcv_finish after l3mdev: skb 0000000047a81488 skb->sk 
0000000000000000
[  752.668395] IPv4: ip_rcv_finish_core: skb 0000000047a81488 skb->sk 
0000000000000000
[  752.668406] IPv4: ip_rcv_finish_core after use hint: skb 0000000047a81488 
skb->sk 0000000000000000
[  752.668417] IPv4: ip_rcv_finish_core in early demux: skb 0000000047a81488 
skb->sk 0000000000000000
[  752.668438] UDP: __udp4_lib_demux_lookup: lport 22811, dport 0, laddr 0, 
raddr 0, net 0000000006d7e0b0
[  752.668452] IPv4: ip_rcv_finish_core before ip route input: skb 
0000000047a81488 skb->sk 0000000097ac0a9a
[  752.668475] IPv4: ip_route_input_slow: skb: 0000000047a81488 skb->sk: 
0000000097ac0a9a, net 0000000006d7e0b0
[  752.668502] IPv4: ip_route_input_slow before dst_alloc: skb: 0000000047a81488 
skb->sk: 0000000097ac0a9a, net 0000000006d7e0b0
[  752.668532] IPv4: ip_route_input_slow: dev: syz_tun net ns: 0000000006d7e0b0
[  752.668543] IPv4: ip_route_input_slow: rt_dev: lo net ns: 0000000006d7e0b0
[  752.668595] IPv4: ip_route_input_slow after dst_alloc: skb: 0000000047a81488 
skb->sk: 0000000097ac0a9a
[  752.668607] IPv4: ip_rcv_finish out: skb 0000000047a81488 skb->sk 
0000000097ac0a9a
[  752.668675] UDP: __udp4_lib_rcv: sk 0000000097ac0a9a, sock_net 
0000000006d7e0b0, dst->net 0000000006d7e0b0
[  752.669378] dst_release: dst: 00000000f5e1b944 net ns: 0000000006d7e0b0

I'm not sure how to deal with early demux in this case. I think we should add 
more restrictions to  __udp4_lib_demux_lookup but I'm not sure which exactly.
Do you have any suggestions?

Vadim



More information about the linux-afs mailing list