[bug report] nvme sends invalid command capsule over rdma transport for 5KiB write when target supports MSDBD > 1

Max Gurtovoy mgurtovoy at nvidia.com
Thu May 27 10:48:58 PDT 2021


On 5/27/2021 8:43 PM, Walker, Benjamin wrote:
>> From: Max Gurtovoy <mgurtovoy at nvidia.com>
>>   
>> On 5/26/2021 7:00 PM, Christoph Hellwig wrote:
>>> On Wed, May 26, 2021 at 05:49:41PM +0300, Max Gurtovoy wrote:
>>>>> We do need for_each_sg here indeed, but you also need to keep
>>>>> incrementing sge for each loop iteration.  I think we can also drop
>>>>> the scat local variable with just a single users and all the
>>>>> renaming while we're at it.
>>>> Is the above fixing the issue ?
>>>>
>>>> Seems like code refactoring to me, right ?
>>> It fixes support for chained SGLs when using inline segments.  Not
>>> sure if it fixes the original bug report, but the current code is broken.
>> ohh, I see the usage of sg_next is missing.
>>
>> Well the original bug is for sure using inline chained SGLs but I thought it's using
>> 2 sg_nents and NVME_INLINE_SG_CNT == 2.
>>
>> maybe there is a split and the IO is using sg_nents == 3 ? and then we chain.
>>
>> Let's wait for Ben's test report (Ben please make sure to fix Sagi's suggestion
>> with Christoph's comment in your test).
> This patch fixes the bug:
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 37943dc4c2c1..6a3f7f6bd1ab 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1320,16 +1320,17 @@ static int nvme_rdma_map_sg_inline(struct nvme_rdma_queue *queue,
>                  int count)
>   {
>          struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
> -       struct scatterlist *sgl = req->data_sgl.sg_table.sgl;
> +      struct scatterlist *sgl, *scat = req->data_sgl.sg_table.sgl;
>          struct ib_sge *sge = &req->sge[1];
>          u32 len = 0;
>          int i;
>
> -       for (i = 0; i < count; i++, sgl++, sge++) {
> +      for_each_sg(scat, sgl, count, i) {
>                  sge->addr = sg_dma_address(sgl);
>                  sge->length = sg_dma_len(sgl);
>                  sge->lkey = queue->device->pd->local_dma_lkey;
>                  len += sge->length;
> +              sge++;

good news.

So either we use the sge[i].length or sge++.

Both will do the job.


>          }



More information about the Linux-nvme mailing list