[PATCH 2/2] nvmet: fix a race condition between release_queue and io_work
Sagi Grimberg
sagi at grimberg.me
Mon Nov 15 01:48:38 PST 2021
>> How about something like this:
>> --
>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>> index 6eb0b3153477..6425375faf5b 100644
>> --- a/drivers/nvme/target/tcp.c
>> +++ b/drivers/nvme/target/tcp.c
>> @@ -1436,7 +1436,9 @@ static void nvmet_tcp_release_queue_work(struct
>> work_struct *w)
>> mutex_unlock(&nvmet_tcp_queue_mutex);
>>
>> nvmet_tcp_restore_socket_callbacks(queue);
>> - flush_work(&queue->io_work);
>> + /* stop accepting incoming data */
>> + kernel_sock_shutdown(queue->sock, SHUT_RD);
>> + cancel_work_sync(&queue->io_work);
>>
>> nvmet_tcp_uninit_data_in_cmds(queue);
>> nvmet_sq_destroy(&queue->nvme_sq);
>> --
>>
>
> This one doesn't work, remember that nvmet_sq_destroy() will start io_work
> again and, as I wrote in my previous email, recvmsg() might still return some
> data after the socket has been shut down.
I see, the reason why we hit this is because we uninit_data_in_cmds as
we need to clear the the sq references so nvmet_sq_destroy() can
complete, and then when nvmet_sq_destroy schedules io_work we hit this.
I think what we need is to make sure we don't recv from the socket.
How about this patch:
--
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 6eb0b3153477..65210dec3f1a 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1436,6 +1436,8 @@ static void nvmet_tcp_release_queue_work(struct
work_struct *w)
mutex_unlock(&nvmet_tcp_queue_mutex);
nvmet_tcp_restore_socket_callbacks(queue);
+ /* stop accepting incoming data */
+ queue->rcv_state = NVMET_TCP_RECV_ERR;
flush_work(&queue->io_work);
nvmet_tcp_uninit_data_in_cmds(queue);
--
More information about the Linux-nvme
mailing list