nvme tcp receive errors

Keith Busch kbusch at kernel.org
Thu Apr 29 04:33:28 BST 2021


On Wed, Apr 28, 2021 at 04:06:12PM -0700, Sagi Grimberg wrote:
> 
> > > In tcp.c:
> > > --
> > > static void nvme_tcp_set_sg_inline(struct nvme_tcp_queue *queue,
> > >                  struct nvme_command *c, u32 data_len)
> > > {
> > >          struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
> > > 
> > >          sg->addr = cpu_to_le64(queue->ctrl->ctrl.icdoff);
> > >          sg->length = cpu_to_le32(data_len);
> > >          sg->type = (NVME_SGL_FMT_DATA_DESC << 4) | NVME_SGL_FMT_OFFSET;
> > > }
> > > 
> > > static void nvme_tcp_set_sg_host_data(struct nvme_command *c,
> > >                  u32 data_len)
> > > {
> > >          struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
> > > 
> > >          sg->addr = 0;
> > >          sg->length = cpu_to_le32(data_len);
> > >          sg->type = (NVME_TRANSPORT_SGL_DATA_DESC << 4) |
> > >                          NVME_SGL_FMT_TRANSPORT_A;
> > > }
> > > --
> > > 
> > > What is the sgl type you see in the traces? transport specific sgl
> > > (host-data i.e. non-incapsule) or inline?
> > 
> > The Sub Type is 0xA, Transport Specific.
> 
> Interesting, I don't see how the host is going to send data
> without it being in-capsule, and before receiving an r2t...

The driver tracepoints captured millions of IO's where everything
happened as expected, so I really think something got confused and
mucked with the wrong request. I've added more trace points to increase
visibility because I frankly didn't find how that could happen just from
code inspection. We will also incorporate your patch below for the next
recreate.
 
> Maybe add this one for a sanity check:
> --
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index eb1feaacd11a..6cb0e13024e5 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -987,6 +987,8 @@ static int nvme_tcp_try_send_cmd_pdu(struct
> nvme_tcp_request *req)
>         len -= ret;
>         if (!len) {
>                 if (inline_data) {
> +                       pr_err("no way... data_len %d queue_max_inline
> %ld\n",
> +                               req->data_len,
> nvme_tcp_inline_data_size(req->queue));
>                         req->state = NVME_TCP_SEND_DATA;
>                         if (queue->data_digest)
>                                 crypto_ahash_init(queue->snd_hash);
> --



More information about the Linux-nvme mailing list