rxrpc kernel sockets hold additional reference to dst
Vadim Fedorenko
vfedorenko at novek.ru
Thu Jan 28 15:13:05 EST 2021
On 28.01.2021 13:44, David Howells wrote:
> Vadim Fedorenko <vfedorenko at novek.ru> wrote:
>
>> Oh, I see, tun_get_user sets skb->sk while building skb. That's why udp_rcv
>> sets sock->sk_rx_dst. I think we should clean skb->sk explicitly somewhere in
>> receive path?
>
> Whose sock->sk_rx_dst? How does it get set on an rxrpc socket, since udp
> can't see it?
>
> Does a function need to be called by rxrpc to drop any IP/UDP wibbly bits
> attached to the socket buffer it just got?
>
> David
>
I think I finally found what happens. rxrpc_open_socket() registers kernel udp
socket with encap_type = UDP_ENCAP_RXRPC. kernel_bind() adds this socket to
hashtable udp_table.hash2. But because this socket is configured with addr = 0
and port = 0 function __udp4_lib_demux_lookup founds it as a socket for any
broadcast udp packet with dport == 0.
So if udp_early_demux is set, ip_rcv_finish_core calls for
__udp4_lib_demux_lookup which finds rxrpc kernel socket with encap_rcv handler
set, and saves this socket to skb->sk. After that in __udp4_lib_rcv
skb_steal_sock completes successfully and skb->dst is assigned to sk->sk_rx_dst
with additional reference taken which is never released and leaked after rxrpc
socket is destroyed. I have added some debug messages to illustrate this condition:
[ 752.485217] rxrpc: rxrpc_open_socket: sk 0000000097ac0a9a net 0000000006d7e0b0
[ 752.668254] __netif_receive_skb_core out: skb 0000000047a81488 skb->sk
0000000000b4bc01
[ 752.668266] IPv4: ip_rcv: skb 0000000047a81488 skb->sk 0000000000b4bc01
[ 752.668278] IPv4: ip_rcv out: skb 0000000047a81488 skb->sk 0000000000000000
[ 752.668372] IPv4: ip_rcv_finish: skb 0000000047a81488 skb->sk 0000000000000000
[ 752.668384] IPv4: ip_rcv_finish after l3mdev: skb 0000000047a81488 skb->sk
0000000000000000
[ 752.668395] IPv4: ip_rcv_finish_core: skb 0000000047a81488 skb->sk
0000000000000000
[ 752.668406] IPv4: ip_rcv_finish_core after use hint: skb 0000000047a81488
skb->sk 0000000000000000
[ 752.668417] IPv4: ip_rcv_finish_core in early demux: skb 0000000047a81488
skb->sk 0000000000000000
[ 752.668438] UDP: __udp4_lib_demux_lookup: lport 22811, dport 0, laddr 0,
raddr 0, net 0000000006d7e0b0
[ 752.668452] IPv4: ip_rcv_finish_core before ip route input: skb
0000000047a81488 skb->sk 0000000097ac0a9a
[ 752.668475] IPv4: ip_route_input_slow: skb: 0000000047a81488 skb->sk:
0000000097ac0a9a, net 0000000006d7e0b0
[ 752.668502] IPv4: ip_route_input_slow before dst_alloc: skb: 0000000047a81488
skb->sk: 0000000097ac0a9a, net 0000000006d7e0b0
[ 752.668532] IPv4: ip_route_input_slow: dev: syz_tun net ns: 0000000006d7e0b0
[ 752.668543] IPv4: ip_route_input_slow: rt_dev: lo net ns: 0000000006d7e0b0
[ 752.668595] IPv4: ip_route_input_slow after dst_alloc: skb: 0000000047a81488
skb->sk: 0000000097ac0a9a
[ 752.668607] IPv4: ip_rcv_finish out: skb 0000000047a81488 skb->sk
0000000097ac0a9a
[ 752.668675] UDP: __udp4_lib_rcv: sk 0000000097ac0a9a, sock_net
0000000006d7e0b0, dst->net 0000000006d7e0b0
[ 752.669378] dst_release: dst: 00000000f5e1b944 net ns: 0000000006d7e0b0
I'm not sure how to deal with early demux in this case. I think we should add
more restrictions to __udp4_lib_demux_lookup but I'm not sure which exactly.
Do you have any suggestions?
Vadim
More information about the linux-afs
mailing list