nvmeof rdma regression issue on 4.14.0-rc1 (or maybe mlx4?)
Yi Zhang
yi.zhang at redhat.com
Thu Oct 19 01:23:38 PDT 2017
On 10/19/2017 02:55 PM, Sagi Grimberg wrote:
>
>>> Hi Yi,
>>>
>>> I was referring to the bug you reported on a simple create_ctrl failed:
>>> https://pastebin.com/7z0XSGSd
>>>
>>> Does it reproduce?
>>>
>> yes, this issue was reproduced during "git bisect" with below patch
>
> OK, if this does not reproduce with the latest code, lets put it aside
> for now.
>
> So as for the error you see, can you please try the following patch?
Hi Sagi
With this patch, no such error log found on host side, but I found there
is no nvme0n1 device node even get "nvme nvme0: Successfully
reconnected" on host.
Host side:
[ 98.181089] nvme nvme0: new ctrl: NQN
"nqn.2014-08.org.nvmexpress.discovery", addr 172.31.0.90:4420
[ 98.329464] nvme nvme0: creating 40 I/O queues.
[ 98.835409] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.0.90:4420
[ 107.873586] nvme nvme0: Reconnecting in 10 seconds...
[ 118.505937] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 118.513443] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 118.519875] nvme nvme0: Failed reconnect attempt 1
[ 118.525241] nvme nvme0: Reconnecting in 10 seconds...
[ 128.733311] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 128.740812] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 128.747247] nvme nvme0: Failed reconnect attempt 2
[ 128.752609] nvme nvme0: Reconnecting in 10 seconds...
[ 138.973404] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 138.980904] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 138.987329] nvme nvme0: Failed reconnect attempt 3
[ 138.992691] nvme nvme0: Reconnecting in 10 seconds...
[ 149.232610] nvme nvme0: creating 40 I/O queues.
[ 149.831443] nvme nvme0: Successfully reconnected
[ 149.831519] nvme nvme0: identifiers changed for nsid 1
[root at rdma-virt-01 linux ((dafb1b2...))]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465.8G 0 disk
├─sda2 8:2 0 464.8G 0 part
│ ├─rhelaa_rdma--virt--01-swap 253:1 0 4G 0 lvm [SWAP]
│ ├─rhelaa_rdma--virt--01-home 253:2 0 410.8G 0 lvm /home
│ └─rhelaa_rdma--virt--01-root 253:0 0 50G 0 lvm /
└─sda1 8:1 0 1G 0 part /boot
> --
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 405895b1dff2..916658e010ff 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -572,6 +572,11 @@ static void nvme_rdma_free_queue(struct
> nvme_rdma_queue *queue)
> if (!test_and_clear_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
> return;
>
> + if(nvme_rdma_queue_idx(queue) == 0)
> + nvme_rdma_free_qe(queue->device->dev,
> + &queue->ctrl->async_event_sqe,
> + sizeof(struct nvme_command), DMA_TO_DEVICE);
> +
> nvme_rdma_destroy_queue_ib(queue);
> rdma_destroy_id(queue->cm_id);
> }
> @@ -739,8 +744,6 @@ static struct blk_mq_tag_set
> *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
> static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl,
> bool remove)
> {
> - nvme_rdma_free_qe(ctrl->queues[0].device->dev,
> &ctrl->async_event_sqe,
> - sizeof(struct nvme_command), DMA_TO_DEVICE);
> nvme_rdma_stop_queue(&ctrl->queues[0]);
> if (remove) {
> blk_cleanup_queue(ctrl->ctrl.admin_q);
> --
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
More information about the Linux-nvme
mailing list