[bug report] nvme sends invalid command capsule over rdma transport for 5KiB write when target supports MSDBD > 1
Sagi Grimberg
sagi at grimberg.me
Wed May 26 02:12:08 PDT 2021
> This bug was found using the iozone tool.
>
> - Linux kernel initiator, SPDK target, RDMA (RoCEv2) transport
> - iozone is performing a 5KiB write to a 512 byte block size nvme device
> - The SPDK target has reported that it supports 4KiB of in-capsule data, MSDBD of 16 (number of SGL descriptors), and ICDOFF of 0.
> - The Linux kernel sends an NVMe-oF capsule with a command that claims to have 5KiB of data in the command, but actually only has a single SGL element describing 4KiB of data in-capsule.
> - The SPDK target correctly fails this I/O
>
> This fails on at least 5.11 but worked prior to 5.4. A git bisect shows that this commit is responsible: 38e1800275d3af607e4df92ff49dc2cf442586a4
>
> I believe the key is the use of MSDBD > 1 and in-capsule data support. This seems to trick the initiator into thinking it can do 5KiB in one command with two SGL elements, but then the initiator goes down the in-capsule data path and can only describe 4KiB that way.
I think it may be a wrong iteration on the scatterlist (which means
that iozone generated a scatterlist with more than 2 entries, causing
the sg_table to be chained).
Does this make the issue go away?
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 8d107b201f16..44cfcaeb5f2e 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1319,22 +1319,22 @@ static int nvme_rdma_map_sg_inline(struct
nvme_rdma_queue *queue,
struct nvme_rdma_request *req, struct nvme_command *c,
int count)
{
- struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
- struct scatterlist *sgl = req->data_sgl.sg_table.sgl;
+ struct nvme_sgl_desc *sgl = &c->common.dptr.sgl;
+ struct scatterlist *sg, *scat = req->data_sgl.sg_table.sgl;
struct ib_sge *sge = &req->sge[1];
u32 len = 0;
int i;
- for (i = 0; i < count; i++, sgl++, sge++) {
- sge->addr = sg_dma_address(sgl);
- sge->length = sg_dma_len(sgl);
+ for_each_sg(scat, sg, count, i) {
+ sge->addr = sg_dma_address(sg);
+ sge->length = sg_dma_len(sg);
sge->lkey = queue->device->pd->local_dma_lkey;
len += sge->length;
}
- sg->addr = cpu_to_le64(queue->ctrl->ctrl.icdoff);
- sg->length = cpu_to_le32(len);
- sg->type = (NVME_SGL_FMT_DATA_DESC << 4) | NVME_SGL_FMT_OFFSET;
+ sgl->addr = cpu_to_le64(queue->ctrl->ctrl.icdoff);
+ sgl->length = cpu_to_le32(len);
+ sgl->type = (NVME_SGL_FMT_DATA_DESC << 4) | NVME_SGL_FMT_OFFSET;
req->num_sge += count;
return 0;
--
More information about the Linux-nvme
mailing list