nvmeof rdma regression issue on 4.14.0-rc1 (or maybe mlx4?)
Sagi Grimberg
sagi at grimberg.me
Mon Oct 2 05:51:43 PDT 2017
Hi,
>>> Panic after connection with below commits, detailed log here:
>>> https://pastebin.com/7z0XSGSd
>>> 31fdf18 nvme-rdma: reuse configure/destroy_admin_queue
>>> 3f02fff nvme-rdma: don't free tagset on resets
>>> 18398af nvme-rdma: disable the controller on resets
>>> b28a308 nvme-rdma: move tagset allocation to a dedicated routine
>>>
>>> good 34b6c23 nvme: Add admin_tagset pointer to nvme_ctrl
>>
>> Is that a reproducible panic? I'm not seeing this at all.
>>
>
> Yes, I can reproduce every time. And the target side kernel version is
> 4.14.0-rc1 during the panic occurred.
>
>> Can you run gdb on nvme-rdma.ko
>> $ l *(nvme_rdma_create_ctrl+0x37d)
>>
> [root at rdma-virt-01 linux ((31fdf18...))]$ gdb
> /usr/lib/modules/4.13.0-rc7.31fdf18+/kernel/drivers/nvme/host/nvme-rdma.ko
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from
> /usr/lib/modules/4.13.0-rc7.31fdf18+/kernel/drivers/nvme/host/nvme-rdma.ko...done.
>
> (gdb) l *(nvme_rdma_create_ctrl+0x37d)
> 0x297d is in nvme_rdma_create_ctrl (drivers/nvme/host/rdma.c:656).
> 651 struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
> 652 struct blk_mq_tag_set *set = admin ?
> 653 &ctrl->admin_tag_set : &ctrl->tag_set;
> 654
> 655 blk_mq_free_tag_set(set);
> 656 nvme_rdma_dev_put(ctrl->device);
> 657 }
> 658
> 659 static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct
> nvme_ctrl *nctrl,
> 660 bool admin)
> (gdb)
Lets take this one step at a time, starting with this issue.
First, there is a reason why a simple create_ctrl fails, can we isolate
exactly which call fails? Was something else going on that might have
made the simple create_ctrl fail?
We don't see any "rdma_resolve_addr failed" or "failed to connect queue"
messages but we do see "creating I/O queues" which means that we either
failed at IO tagset allocation or initializing connect_q.
We have a missing error code assignment so can you try the following patch:
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 58983000964b..98dd51e630bd 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -765,8 +765,10 @@ static int nvme_rdma_configure_admin_queue(struct
nvme_rdma_ctrl *ctrl,
if (new) {
ctrl->ctrl.admin_tagset =
nvme_rdma_alloc_tagset(&ctrl->ctrl, true);
- if (IS_ERR(ctrl->ctrl.admin_tagset))
+ if (IS_ERR(ctrl->ctrl.admin_tagset)) {
+ error = PTR_ERR(ctrl->ctrl.admin_tagset);
goto out_free_queue;
+ }
ctrl->ctrl.admin_q =
blk_mq_init_queue(&ctrl->admin_tag_set);
if (IS_ERR(ctrl->ctrl.admin_q)) {
@@ -846,8 +848,10 @@ static int nvme_rdma_configure_io_queues(struct
nvme_rdma_ctrl *ctrl, bool new)
if (new) {
ctrl->ctrl.tagset = nvme_rdma_alloc_tagset(&ctrl->ctrl,
false);
- if (IS_ERR(ctrl->ctrl.tagset))
+ if (IS_ERR(ctrl->ctrl.tagset)) {
+ ret = PTR_ERR(ctrl->ctrl.tagset);
goto out_free_io_queues;
+ }
ctrl->ctrl.connect_q = blk_mq_init_queue(&ctrl->tag_set);
if (IS_ERR(ctrl->ctrl.connect_q)) {
--
Also, can you add the following debug messages to find out what failed?
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 58983000964b..e46475100eea 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -676,6 +676,12 @@ static void nvme_rdma_free_tagset(struct nvme_ctrl
*nctrl, bool admin)
struct blk_mq_tag_set *set = admin ?
&ctrl->admin_tag_set : &ctrl->tag_set;
+ if (set == &ctrl->tag_set) {
+ pr_err("%s: freeing IO tagset\n", __func__);
+ } else {
+ pr_err("%s: freeing ADMIN tagset\n", __func__);
+ }
+
blk_mq_free_tag_set(set);
nvme_rdma_dev_put(ctrl->device);
}
--
More information about the Linux-nvme
mailing list