[PATCH 2/2] nvmet: fix a race condition between release_queue and io_work

Sagi Grimberg sagi at grimberg.me
Sun Nov 14 02:28:45 PST 2021



On 11/12/21 12:54 PM, Maurizio Lombardi wrote:
> Hi Sagi,
> 
> On Thu, Nov 04, 2021 at 02:59:53PM +0200, Sagi Grimberg wrote:
>>
>> Right, after the call to cancel_work_sync we will know that io_work
>> is not running. Note that it can run as a result of a backend completion
>> but that is ok and we do want to let it run and return completion to the
>> host, but the socket should already be shut down for recv, so we cannot
>> get any other byte from the network.
> 
> 
> I did some tests and I found out that kernel_recvmsg() sometimes returns
> data even if the socket has been already shut down (maybe it's data it received
> before the call to kernel_sock_shutdown() and waiting in some internal buffer?).
> 
> So when nvmet_sq_destroy() triggered io_work, recvmsg() still returned data
> and the kernel crashed again despite the socket was closed.
> 
> Therefore, I think that after we shut down the socket we
> should let io_work run and requeue itself until it finishes its job
> and no more data is returned by recvmsg(),
> one way to achieve this is to repeatedly call flush_work() until it returns
> false.
> 
> Right now I am testing the patch below and it works perfectly.
> 
> Note that when the socket is closed recvmsg() might return 0,
> nvmet_tcp_try_recv_data() should return -EAGAIN
> in that case otherwise we end up in an infinite loop (io_work
> will countinously requeue itself).
> 
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 2f03a94725ae..7b441071c6b9 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1139,8 +1139,10 @@ static int nvmet_tcp_try_recv_data(struct nvmet_tcp_queue *queue)
>   	while (msg_data_left(&cmd->recv_msg)) {
>   		ret = sock_recvmsg(cmd->queue->sock, &cmd->recv_msg,
>   			cmd->recv_msg.msg_flags);
> -		if (ret <= 0)
> +		if (ret < 0)
>   			return ret;
> +		else if (ret == 0)
> +			return -EAGAIN;
>   
>   		cmd->pdu_recv += ret;
>   		cmd->rbytes_done += ret;
> @@ -1446,8 +1450,10 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w)
>   	list_del_init(&queue->queue_list);
>   	mutex_unlock(&nvmet_tcp_queue_mutex);
>   
> +	kernel_sock_shutdown(queue->sock, SHUT_RD);
> +
>   	nvmet_tcp_restore_socket_callbacks(queue);

I think you should shutdown the socket after you restore the socket
callbacks, and after that cancel_work_sync to deny a self-requeue.

> -	flush_work(&queue->io_work);
> +	while (flush_work(&queue->io_work));
>   
>   	nvmet_tcp_uninit_data_in_cmds(queue);
>   	nvmet_sq_destroy(&queue->nvme_sq);
> 

How about something like this:
--
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 6eb0b3153477..6425375faf5b 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1436,7 +1436,9 @@ static void nvmet_tcp_release_queue_work(struct 
work_struct *w)
         mutex_unlock(&nvmet_tcp_queue_mutex);

         nvmet_tcp_restore_socket_callbacks(queue);
-       flush_work(&queue->io_work);
+       /* stop accepting incoming data */
+       kernel_sock_shutdown(queue->sock, SHUT_RD);
+       cancel_work_sync(&queue->io_work);

         nvmet_tcp_uninit_data_in_cmds(queue);
         nvmet_sq_destroy(&queue->nvme_sq);
--




More information about the Linux-nvme mailing list