rxrpc kernel sockets hold additional reference to dst

Vadim Fedorenko vfedorenko at novek.ru
Fri Jan 29 07:19:47 EST 2021


On 29.01.2021 11:29, David Howells wrote:
> Vadim Fedorenko <vfedorenko at novek.ru> wrote:
> 
>>> Alternatively, should rxrpc discard the dst attached to the skb?  Having it
>>> available could be useful.  One thing I'd like to do at some point is cache
>>> the dst in an rxrpc_connection struct so that I can skip the address lookup
>>> when doing sendmsg.
>>
>> Discarding sk->sk_rx_dst on every rxrpc_input_packet works in this special
>> test but could lead to strange effects in case of concurrent receiving.
> 
> I think I must be getting confused as to what the problem actually is.
> 
> I was talking about skb->_skb_refdst.  I thought you meant that that was the
> ref that was leaking:
> 
> 	After that in __udp4_lib_rcv skb_steal_sock completes successfully and
> 	skb->dst is assigned to sk->sk_rx_dst with additional reference taken
> 	which is never released and leaked after rxrpc socket is destroyed.

Well, the main problem found by syzkaller bot is that leaked dst holds reference 
to netdevice and makes it impossible to destroy namespace. The root cause is 
that rxrpc kernel socket never releases dst set by early demux on UDP level.

And the next problem is a dead lock of releasing references when it was 
initiated by namespace destruction. Releasing of netdevices is done earlier than 
subsystems and it assumes that all the refernces to netdevices are released. But 
in rxrpc subsystem socket destruction (which could release sk_rx_dst) is done in 
subsystem destruction which should be done after netdevices.

> I can see, sk->sk_rx_dst is exchanged each time a packet is processed that's
> from a different source.  Btw, is there a reason we're doing this on a UDP
> socket where there could be packets coming in from all over the place
> simultaneously?  Do we serialise processing of incoming packets from multiple
> interfaces?  I'm just wondering if udp_v4_early_demux() doing:
> 
> 	dst = READ_ONCE(sk->sk_rx_dst);
> 
> is actually useful.  Could it have been swapped several times since we set it?

 From this point net stack uses only input part of dst to deliver skb to correct 
socket. For any stream to socket, the input part will be the same and it 
eliminates costly route lookup. So it is actually useful.

> Also, shouldn't the UDP socket clean up sk->sk_rx_dst when it closes?

General udp socket uses inet_csk() destroy method which takes care of sk_rx_dst. 
In rxrpc socket this method is overriden and that's why it must take care of any 
cached dsts by itself.

>> If you want to cache dst for sendmsg I think you should consider separating
>> registration of subsystem and register network (udp socket) part with
>> register_pernet_device() and add releasing dst references in socket
>> desctructor. In this case socket destructor will be called together with all
>> the devices in namespace and will not lead to dead lock.
> 
> Yeah, I haven't really gotten around to looking deeply into this yet.  Why
> would it deadlock?  Is dst information released earlier?

Yeah, dst holds reference to netdevice and must be released earlier.

> I would expect to be releasing cached dst records when cleaning up the
> connection cache (rxrpc_destroy_all_connections()).  Currently that's called
> from rxrpc_exit_net() and done before the UDP socket is closed so that I can
> flush any outstanding deferred final ACKs of calls done on those
> connections[*].
> 
> [*] An Rx RPC is phased as DATA packets going client->server (the request),
>      DATA packets going server->client (the reply) and then the client sending
>      an ACK to polish off.  However, the final ACK is implicit if we instead
>      send the first DATA packet of the next call on the same connection
>      channel.  This saves a packet.  So final ACK transmission is deferred by a
>      small interval, just in case we can discard it.

I see. But all the network-related references should be released while cleaning 
netdevices. That's why I think it should be moved to netdevices instead of 
subsystems.

Vadim




More information about the linux-afs mailing list