crash at nvme_tcp_init_iter with header digest enabled

Mon Sep 5 02:46:40 PDT 2022

>> Hi,
>>
>> we got a customer bug report against our downstream kernel
>> when doing fail over tests with header digest enabled.
>>
>> The whole crash looks like an user after free bug but
>> so far we were not able to figure out where it happens.
>>
>>    nvme nvme13: queue 1: header digest flag is cleared
>>    nvme nvme13: receive failed:  -71
>>    nvme nvme13: starting error recovery
>>    nvme nvme7: Reconnecting in 10 seconds...
>>
>>    RIP: nvme_tcp_init_iter
>>
>>    nvme_tcp_recv_skb
>>    ? tcp_mstamp_refresh
>>    ? nvme_tcp_submit_async_event
>>    tcp_read_sock
>>    nvme_tcp_try_recv
>>    nvme_tcp_io_work
>>    process_one_work
>>    ? process_one_work
>>    worker_thread
>>    ? process_one_work
>>    kthread
>>    ? set_kthread_struct
>>    ret_from_fork
>>
>> In order to rule out that this caused by an reuse of a command id, I
>> added a test patch which always clears the request pointer (see below)
>> and hoped to see
>>
>>     "got bad cqe.command_id %#x on queue %d\n"
>>
>> but there was none. Instead the crash disappeared. It looks like we are
>> not clearing the request in the error path, but so far I haven't figured
>> out how this is related to the header digest enabled.
>>
>> Anyway, this is just a FYI and in case anyone has an idea where to poke
>> at; I am listening.
> 
> I think I see the problem. The stream is corrupted, and we keep
> processing it.
> 
> The current logic says that once we hit a header-digest problem, we
> immediately stop reading from the socket (rd_enabled=false) and trigger
> error recovery.
> 
> When rd_enabled=false, we don't act on data_ready callbacks, as we know
> we are tearing down the socket. However we may keep reading from the
> socket if the io_work continues and calls try_recv again (mainly because
> our error from nvme_tcp_recv_skb is not propagated back).
> 
> I think that this will make the issue go away:
> -- 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index e82dcfcda29b..3e3ebde4eff5 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1229,7 +1229,7 @@ static void nvme_tcp_io_work(struct work_struct *w)
>                  else if (unlikely(result < 0))
>                          return;
> 
> -               if (!pending)
> +               if (!pending || !queue->rd_enabled)
>                          return;
> 
>          } while (!time_after(jiffies, deadline)); /* quota is exhausted */
> -- 

Daniel, any input here?