nvmeof rdma regression issue on 4.14.0-rc1 (or maybe mlx4?)

Yi Zhang yi.zhang at redhat.com
Thu Oct 19 01:23:38 PDT 2017



On 10/19/2017 02:55 PM, Sagi Grimberg wrote:
>
>>> Hi Yi,
>>>
>>> I was referring to the bug you reported on a simple create_ctrl failed:
>>> https://pastebin.com/7z0XSGSd
>>>
>>> Does it reproduce?
>>>
>> yes, this issue was reproduced during "git bisect" with below patch
>
> OK, if this does not reproduce with the latest code, lets put it aside
> for now.
>
> So as for the error you see, can you please try the following patch?
Hi Sagi
With this patch, no such error log found on host side, but I found there 
is no nvme0n1 device node even get "nvme nvme0: Successfully 
reconnected" on host.

Host side:
[   98.181089] nvme nvme0: new ctrl: NQN 
"nqn.2014-08.org.nvmexpress.discovery", addr 172.31.0.90:4420
[   98.329464] nvme nvme0: creating 40 I/O queues.
[   98.835409] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.0.90:4420
[  107.873586] nvme nvme0: Reconnecting in 10 seconds...
[  118.505937] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  118.513443] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  118.519875] nvme nvme0: Failed reconnect attempt 1
[  118.525241] nvme nvme0: Reconnecting in 10 seconds...
[  128.733311] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  128.740812] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  128.747247] nvme nvme0: Failed reconnect attempt 2
[  128.752609] nvme nvme0: Reconnecting in 10 seconds...
[  138.973404] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[  138.980904] nvme nvme0: rdma_resolve_addr wait failed (-104).
[  138.987329] nvme nvme0: Failed reconnect attempt 3
[  138.992691] nvme nvme0: Reconnecting in 10 seconds...
[  149.232610] nvme nvme0: creating 40 I/O queues.
[  149.831443] nvme nvme0: Successfully reconnected
[  149.831519] nvme nvme0: identifiers changed for nsid 1
[root at rdma-virt-01 linux ((dafb1b2...))]$ lsblk
NAME                           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                              8:0    0 465.8G  0 disk
├─sda2                           8:2    0 464.8G  0 part
│ ├─rhelaa_rdma--virt--01-swap 253:1    0     4G  0 lvm  [SWAP]
│ ├─rhelaa_rdma--virt--01-home 253:2    0 410.8G  0 lvm  /home
│ └─rhelaa_rdma--virt--01-root 253:0    0    50G  0 lvm  /
└─sda1                           8:1    0     1G  0 part /boot

> -- 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 405895b1dff2..916658e010ff 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -572,6 +572,11 @@ static void nvme_rdma_free_queue(struct 
> nvme_rdma_queue *queue)
>         if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
>                 return;
>
> +       if(nvme_rdma_queue_idx(queue) == 0)
> +               nvme_rdma_free_qe(queue->device->dev,
> +                       &queue->ctrl->async_event_sqe,
> +                       sizeof(struct nvme_command), DMA_TO_DEVICE);
> +
>         nvme_rdma_destroy_queue_ib(queue);
>         rdma_destroy_id(queue->cm_id);
>  }
> @@ -739,8 +744,6 @@ static struct blk_mq_tag_set 
> *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
>  static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl,
>                 bool remove)
>  {
> -       nvme_rdma_free_qe(ctrl->queues[0].device->dev, 
> &ctrl->async_event_sqe,
> -                       sizeof(struct nvme_command), DMA_TO_DEVICE);
>         nvme_rdma_stop_queue(&ctrl->queues[0]);
>         if (remove) {
>                 blk_cleanup_queue(ctrl->ctrl.admin_q);
> -- 
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme




More information about the Linux-nvme mailing list