nvme tcp receive errors

Tue Apr 27 19:12:36 BST 2021

On Mon, Apr 26, 2021 at 08:10:38PM -0700, Sagi Grimberg wrote:
> 
> > This patch is successful for clearing up the observed "no space" issues
> > observed during read tests.
> > 
> > There are still some issues with write tests that look a bit different.
> > I'll get more details on that for you today,
> 
> sure.
> 
> > but it's probably okay if
> > you want to make a formal patch for the receive data side.
> 
> What should I put on the Reported-by: and Tested-by: tags?

This report and testing was done courtesy of

  Narayan Ayalasomayajula <Narayan.Ayalasomayajula at wdc.com>

Before you submit a patch, though, we did additional testing with data
digest enabled and observe a regression with the following error:

  nvme nvme0: queue 0: data digest flag is cleared

>From looking at the patch, the following part looks a bit suspicious:

> @@ -776,19 +776,20 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
> 		req->data_recvd += recv_len;
> 	}
> 
> -	if (!queue->data_remaining) {
> +	if (!queue->data_remaining)
> +		nvme_tcp_init_recv_ctx(queue);

The code had previously called nvme_tcp_init_recv_ctx() only if
queue->data_digest wasn't set, but now it's called all the time. I see
that calling this function clears ddgst_remaining, so does that explain
the new errors?

> +	if (req->data_recvd == req->data_len) {
>  		if (queue->data_digest) {
>  			nvme_tcp_ddgst_final(queue->rcv_hash, &queue->exp_ddgst);
>  			queue->ddgst_remaining = NVME_TCP_DIGEST_LENGTH;
>  		} else {
> -			BUG_ON(req->data_recvd != req->data_len);
>  			req->cmd_state = NVME_TCP_CMD_DATA_DONE;
>  			if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
>  				req->cmd_state = NVME_TCP_CMD_DONE;
>  				nvme_tcp_end_request(rq, NVME_SC_SUCCESS);
>  				queue->nr_cqe++;
>  			}
> -			nvme_tcp_init_recv_ctx(queue);
>  		}
>  	}