[PATCH] nvme-tcp: proper handling of tcp socket closing flows
Grupi, Elad
Elad.Grupi at dell.com
Sun Jan 31 10:47:07 EST 2021
Is it possible that we will have the following flow:
Thread 1 taking sk_callback_lock and goes into the switch case of sk_state, going to line:
sock->sk->sk_user_data = queue;
at that point, thread 2 invokes sk_state_change, but this is still the original state_change callback because we didn't set yet to nvmet_tcp_state_change.
Thread 1 continues and set the callback to nvmet_tcp_state_change, but the state was already changed and nvmet_tcp_state_change will be never called.
I think this race still exist even when locked by sk_callback_lock.
-----Original Message-----
From: Sagi Grimberg <sagi at grimberg.me>
Sent: Friday, 29 January 2021 1:34
To: Grupi, Elad; linux-nvme at lists.infradead.org
Subject: Re: [PATCH] nvme-tcp: proper handling of tcp socket closing flows
[EXTERNAL EMAIL]
> That will work if sk_state_change is called under sk_callback_lock.
It is.
> In addition, need to flush_work(&queue->port->accept_work) in nvmet_tcp_release_queue_work because accept_work is still using queue struct after unlocking sk_callback_lock.
But sk->sk_user_data was not set, so I don't see how the release work can invoke.
> What about this instead?
> --
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index c41902f7ce39..6388d18ca7c2 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1494,16 +1494,28 @@ static int nvmet_tcp_set_queue_sock(struct nvmet_tcp_queue *queue)
> ip_sock_set_tos(sock->sk, inet->rcv_tos);
>
> write_lock_bh(&sock->sk->sk_callback_lock);
> - sock->sk->sk_user_data = queue;
> - queue->data_ready = sock->sk->sk_data_ready;
> - sock->sk->sk_data_ready = nvmet_tcp_data_ready;
> - queue->state_change = sock->sk->sk_state_change;
> - sock->sk->sk_state_change = nvmet_tcp_state_change;
> - queue->write_space = sock->sk->sk_write_space;
> - sock->sk->sk_write_space = nvmet_tcp_write_space;
> + switch (sk->sk_state) {
> + case TCP_FIN_WAIT1:
> + case TCP_CLOSE_WAIT:
> + case TCP_CLOSE:
> + /*
> + * If the socket is already closing, don't even start
> + * consuming it
> + */
> + ret = -ENOTCONN;
> + break;
> + default:
> + sock->sk->sk_user_data = queue;
> + queue->data_ready = sock->sk->sk_data_ready;
> + sock->sk->sk_data_ready = nvmet_tcp_data_ready;
> + queue->state_change = sock->sk->sk_state_change;
> + sock->sk->sk_state_change = nvmet_tcp_state_change;
> + queue->write_space = sock->sk->sk_write_space;
> + sock->sk->sk_write_space = nvmet_tcp_write_space;
> + }
> write_unlock_bh(&sock->sk->sk_callback_lock);
>
> - return 0;
> + return ret;
> }
> --
>
More information about the Linux-nvme
mailing list