nvmeof rdma regression issue on 4.14.0-rc1 (or maybe mlx4?)

Sagi Grimberg sagi at grimberg.me
Mon Oct 2 05:51:43 PDT 2017


Hi,

>>> Panic after connection with below commits, detailed log here: 
>>> https://pastebin.com/7z0XSGSd
>>> 31fdf18     nvme-rdma: reuse configure/destroy_admin_queue
>>> 3f02fff       nvme-rdma: don't free tagset on resets
>>> 18398af    nvme-rdma: disable the controller on resets
>>> b28a308   nvme-rdma: move tagset allocation to a dedicated routine
>>>
>>> good    34b6c23 nvme: Add admin_tagset pointer to nvme_ctrl
>>
>> Is that a reproducible panic? I'm not seeing this at all.
>>
> 
> Yes, I can reproduce every time. And the target side kernel version is 
> 4.14.0-rc1 during the panic occurred.
> 
>> Can you run gdb on nvme-rdma.ko
>> $ l *(nvme_rdma_create_ctrl+0x37d)
>>
> [root at rdma-virt-01 linux ((31fdf18...))]$ gdb 
> /usr/lib/modules/4.13.0-rc7.31fdf18+/kernel/drivers/nvme/host/nvme-rdma.ko
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from 
> /usr/lib/modules/4.13.0-rc7.31fdf18+/kernel/drivers/nvme/host/nvme-rdma.ko...done. 
> 
> (gdb) l *(nvme_rdma_create_ctrl+0x37d)
> 0x297d is in nvme_rdma_create_ctrl (drivers/nvme/host/rdma.c:656).
> 651        struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
> 652        struct blk_mq_tag_set *set = admin ?
> 653                &ctrl->admin_tag_set : &ctrl->tag_set;
> 654
> 655        blk_mq_free_tag_set(set);
> 656        nvme_rdma_dev_put(ctrl->device);
> 657    }
> 658
> 659    static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct 
> nvme_ctrl *nctrl,
> 660            bool admin)
> (gdb)

Lets take this one step at a time, starting with this issue.

First, there is a reason why a simple create_ctrl fails, can we isolate
exactly which call fails? Was something else going on that might have
made the simple create_ctrl fail?

We don't see any "rdma_resolve_addr failed" or "failed to connect queue"
messages but we do see "creating I/O queues" which means that we either
failed at IO tagset allocation or initializing connect_q.

We have a missing error code assignment so can you try the following patch:
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 58983000964b..98dd51e630bd 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -765,8 +765,10 @@ static int nvme_rdma_configure_admin_queue(struct 
nvme_rdma_ctrl *ctrl,

         if (new) {
                 ctrl->ctrl.admin_tagset = 
nvme_rdma_alloc_tagset(&ctrl->ctrl, true);
-               if (IS_ERR(ctrl->ctrl.admin_tagset))
+               if (IS_ERR(ctrl->ctrl.admin_tagset)) {
+                       error = PTR_ERR(ctrl->ctrl.admin_tagset);
                         goto out_free_queue;
+               }

                 ctrl->ctrl.admin_q = 
blk_mq_init_queue(&ctrl->admin_tag_set);
                 if (IS_ERR(ctrl->ctrl.admin_q)) {
@@ -846,8 +848,10 @@ static int nvme_rdma_configure_io_queues(struct 
nvme_rdma_ctrl *ctrl, bool new)

         if (new) {
                 ctrl->ctrl.tagset = nvme_rdma_alloc_tagset(&ctrl->ctrl, 
false);
-               if (IS_ERR(ctrl->ctrl.tagset))
+               if (IS_ERR(ctrl->ctrl.tagset)) {
+                       ret = PTR_ERR(ctrl->ctrl.tagset);
                         goto out_free_io_queues;
+               }

                 ctrl->ctrl.connect_q = blk_mq_init_queue(&ctrl->tag_set);
                 if (IS_ERR(ctrl->ctrl.connect_q)) {
--

Also, can you add the following debug messages to find out what failed?
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 58983000964b..e46475100eea 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -676,6 +676,12 @@ static void nvme_rdma_free_tagset(struct nvme_ctrl 
*nctrl, bool admin)
         struct blk_mq_tag_set *set = admin ?
                         &ctrl->admin_tag_set : &ctrl->tag_set;

+       if (set == &ctrl->tag_set) {
+               pr_err("%s: freeing IO tagset\n", __func__);
+       } else {
+               pr_err("%s: freeing ADMIN tagset\n", __func__);
+       }
+
         blk_mq_free_tag_set(set);
         nvme_rdma_dev_put(ctrl->device);
  }
--



More information about the Linux-nvme mailing list