[PATCH 2/2] nvmet: fix a race condition between release_queue and io_work
Sagi Grimberg
sagi at grimberg.me
Mon Nov 15 02:13:08 PST 2021
>> --
>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>> index 6eb0b3153477..65210dec3f1a 100644
>> --- a/drivers/nvme/target/tcp.c
>> +++ b/drivers/nvme/target/tcp.c
>> @@ -1436,6 +1436,8 @@ static void nvmet_tcp_release_queue_work(struct
>> work_struct *w)
>> mutex_unlock(&nvmet_tcp_queue_mutex);
>>
>> nvmet_tcp_restore_socket_callbacks(queue);
>> + /* stop accepting incoming data */
>> + queue->rcv_state = NVMET_TCP_RECV_ERR;
>> flush_work(&queue->io_work);
>>
>> nvmet_tcp_uninit_data_in_cmds(queue);
>> --
>>
>
> Ok I can repeat the test, but you probably want to do this instead:
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index fb72e2d67fd5..d21b525fd4cb 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1450,7 +1450,9 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w)
> mutex_unlock(&nvmet_tcp_queue_mutex);
>
> nvmet_tcp_restore_socket_callbacks(queue);
> - flush_work(&queue->io_work);
> + cancel_work_sync(&queue->io_work);
> + /* stop accepting incoming data */
> + queue->rcv_state = NVMET_TCP_RECV_ERR;
>
> nvmet_tcp_uninit_data_in_cmds(queue);
> nvmet_sq_destroy(&queue->nvme_sq);
>
> If you don't perform a cancel_work_sync() you may race against a running
> io_work thread that may overwrite rcv_state with some other value.
Yes, that makes sense.
More information about the Linux-nvme
mailing list