crash at nvme_tcp_init_iter with header digest enabled

Sun Aug 28 05:09:53 PDT 2022

> Hi,
> 
> we got a customer bug report against our downstream kernel
> when doing fail over tests with header digest enabled.
> 
> The whole crash looks like an user after free bug but
> so far we were not able to figure out where it happens.
> 
>    nvme nvme13: queue 1: header digest flag is cleared
>    nvme nvme13: receive failed:  -71
>    nvme nvme13: starting error recovery
>    nvme nvme7: Reconnecting in 10 seconds...
> 
>    RIP: nvme_tcp_init_iter
> 
>    nvme_tcp_recv_skb
>    ? tcp_mstamp_refresh
>    ? nvme_tcp_submit_async_event
>    tcp_read_sock
>    nvme_tcp_try_recv
>    nvme_tcp_io_work
>    process_one_work
>    ? process_one_work
>    worker_thread
>    ? process_one_work
>    kthread
>    ? set_kthread_struct
>    ret_from_fork
> 
> In order to rule out that this caused by an reuse of a command id, I
> added a test patch which always clears the request pointer (see below)
> and hoped to see
> 
>     "got bad cqe.command_id %#x on queue %d\n"
> 
> but there was none. Instead the crash disappeared. It looks like we are
> not clearing the request in the error path, but so far I haven't figured
> out how this is related to the header digest enabled.
> 
> Anyway, this is just a FYI and in case anyone has an idea where to poke
> at; I am listening.

I think I see the problem. The stream is corrupted, and we keep
processing it.

The current logic says that once we hit a header-digest problem, we
immediately stop reading from the socket (rd_enabled=false) and trigger
error recovery.

When rd_enabled=false, we don't act on data_ready callbacks, as we know
we are tearing down the socket. However we may keep reading from the
socket if the io_work continues and calls try_recv again (mainly because
our error from nvme_tcp_recv_skb is not propagated back).

I think that this will make the issue go away:
--

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index e82dcfcda29b..3e3ebde4eff5 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1229,7 +1229,7 @@ static void nvme_tcp_io_work(struct work_struct *w)
                 else if (unlikely(result < 0))
                         return;

-               if (!pending)
+               if (!pending || !queue->rd_enabled)
                         return;

         } while (!time_after(jiffies, deadline)); /* quota is exhausted */
--