[bug report] nvme sends invalid command capsule over rdma transport for 5KiB write when target supports MSDBD > 1

Sagi Grimberg sagi at grimberg.me
Wed May 26 02:12:08 PDT 2021


> This bug was found using the iozone tool.
> 
> - Linux kernel initiator, SPDK target, RDMA (RoCEv2) transport
> - iozone is performing a 5KiB write to a 512 byte block size nvme device
> - The SPDK target has reported that it supports 4KiB of in-capsule data, MSDBD of 16 (number of SGL descriptors), and ICDOFF of 0.
> - The Linux kernel sends an NVMe-oF capsule with a command that claims to have 5KiB of data in the command, but actually only has a single SGL element describing 4KiB of data in-capsule.
> - The SPDK target correctly fails this I/O
> 
> This fails on at least 5.11 but worked prior to 5.4. A git bisect shows that this commit is responsible: 38e1800275d3af607e4df92ff49dc2cf442586a4
> 
> I believe the key is the use of MSDBD > 1 and in-capsule data support. This seems to trick the initiator into thinking it can do 5KiB in one command with two SGL elements, but then the initiator goes down the in-capsule data path and can only describe 4KiB that way.

I think it may be a wrong iteration on the scatterlist (which means
that iozone generated a scatterlist with more than 2 entries, causing
the sg_table to be chained).

Does this make the issue go away?
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 8d107b201f16..44cfcaeb5f2e 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1319,22 +1319,22 @@ static int nvme_rdma_map_sg_inline(struct 
nvme_rdma_queue *queue,
                 struct nvme_rdma_request *req, struct nvme_command *c,
                 int count)
  {
-       struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
-       struct scatterlist *sgl = req->data_sgl.sg_table.sgl;
+       struct nvme_sgl_desc *sgl = &c->common.dptr.sgl;
+       struct scatterlist *sg, *scat = req->data_sgl.sg_table.sgl;
         struct ib_sge *sge = &req->sge[1];
         u32 len = 0;
         int i;

-       for (i = 0; i < count; i++, sgl++, sge++) {
-               sge->addr = sg_dma_address(sgl);
-               sge->length = sg_dma_len(sgl);
+       for_each_sg(scat, sg, count, i) {
+               sge->addr = sg_dma_address(sg);
+               sge->length = sg_dma_len(sg);
                 sge->lkey = queue->device->pd->local_dma_lkey;
                 len += sge->length;
         }

-       sg->addr = cpu_to_le64(queue->ctrl->ctrl.icdoff);
-       sg->length = cpu_to_le32(len);
-       sg->type = (NVME_SGL_FMT_DATA_DESC << 4) | NVME_SGL_FMT_OFFSET;
+       sgl->addr = cpu_to_le64(queue->ctrl->ctrl.icdoff);
+       sgl->length = cpu_to_le32(len);
+       sgl->type = (NVME_SGL_FMT_DATA_DESC << 4) | NVME_SGL_FMT_OFFSET;

         req->num_sge += count;
         return 0;
--



More information about the Linux-nvme mailing list