nvme-tcp: Queue deadlock (stuck PDU) on NVMe TCP host driver

Fri Dec 11 03:10:46 EST 2020

Hi Sagi,

>> If this occurs, the H2CData PDU will not be sent until io_work is rescheduled. If we are lucky, this will happen soon, because of other activity on the socket (new responses from the target for example). If we are not, the H2CData PDU will never be sent and the queue is deadlocked. Eventually the block io timeout will kick in and tear down the queue.
>> 
>> I have not been able to prove formally that the above sequence is what we are observing. I have been able to test rescheduling the workqueue if the call to mutex_trylock() in nvme_tcp_try_send fails, which fixes the issue for us.
>> 
>> 
>> What do you think? I'd be grateful for any comments you may have this.

> I agree with the analysis.

> I'm assuming that this patch makes the issue go away:
> --
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 1ba659927442..9193b05d7bda 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1122,6 +1122,14 @@ static void nvme_tcp_io_work(struct work_struct *w)
>                                 pending = true;
>                         else if (unlikely(result < 0))
>                                 break;
> +               } else {
> +                       /*
> +                        * submission path is sending, we need to
> +                        * continue or resched because the submission
> +                        * path direct send is not concerned with
> +                        * rescheduling...
> +                        */
> +                       pending = true;
>                 }
>
>                 result = nvme_tcp_try_recv(queue);
> --

I confirm this fixes the issue for us.

Regards,
Samuel