[PATCH 2/2] nvmet: fix a race condition between release_queue and io_work

Sun Nov 14 23:52:52 PST 2021

Hi,

On Fri, Nov 12, 2021 at 10:54:42AM -0500, John Meneghini wrote:
> Nice work Maurizio. This should solve some of the problems we are seeing with nvme/tcp shutdown.
> 
> Do you think we have a similar problem on the host side, in nvme_tcp_init_connection?
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 8cb15ee5b249..adca40c932b7 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1271,8 +1271,12 @@ static int nvme_tcp_init_connection(struct nvme_tcp_queue *queue)
>         memset(&msg, 0, sizeof(msg));
>         iov.iov_base = icresp;
>         iov.iov_len = sizeof(*icresp);
> -       ret = kernel_recvmsg(queue->sock, &msg, &iov, 1,
> +
> +       do {
> +               ret = kernel_recvmsg(queue->sock, &msg, &iov, 1,
>                         iov.iov_len, msg.msg_flags);
> +        } while (ret == 0);
> +
>         if (ret < 0)
>                 goto free_icresp;
>

At the moment I don't know if there is a similar problem in the host side, I'll have to look
at it a bit.
However, I don't think this patch will work. If the socket has been shut down recvmsg() will
return 0 and it will fall into an infinite loop.

Maurizio