Cannot Connect NVMeoF At Certain NR_IO_Queues Values

Gruher, Joseph R joseph.r.gruher at intel.com
Mon May 14 10:46:23 PDT 2018


I'm running Ubuntu 18.04 with the included 4.15.0 kernel and Mellanox CX4 NICs and Intel P4800X SSDs.  I'm using NVMe-CLI v1.5 and nvmetcli v0.6.

I am getting a connect failure even at a relatively moderate nr_io_queues value such as 8:

rsa at tppjoe01:~$ sudo nvme connect -t rdma -a 10.6.0.16 -i 8 -n NQN1
Failed to write to /dev/nvme-fabrics: Invalid cross-device link

However, it works just fine if I use a smaller value, such as 4:

rsa at tppjoe01:~$ sudo nvme connect -t rdma -a 10.6.0.16 -i 4 -n NQN1
rsa at tppjoe01:~$

Target side dmesg from a failed attached with -i 8:

[425470.899691] nvmet: creating controller 1 for subsystem NQN1 for NQN nqn.2014-08.org.nvmexpress:uuid:8d0ac789-9136-4275-a46c-8d1223c8fe84.
[425471.081358] nvmet: adding queue 1 to ctrl 1.
[425471.081563] nvmet: adding queue 2 to ctrl 1.
[425471.081758] nvmet: adding queue 3 to ctrl 1.
[425471.110059] nvmet_rdma: freeing queue 3
[425471.110946] nvmet_rdma: freeing queue 1
[425471.111905] nvmet_rdma: freeing queue 2
[425471.382128] nvmet_rdma: freeing queue 4
[425471.522836] nvmet_rdma: freeing queue 5
[425471.640105] nvmet_rdma: freeing queue 7
[425471.669427] nvmet_rdma: freeing queue 6
[425471.670107] nvmet_rdma: freeing queue 0
[425471.692922] nvmet_rdma: freeing queue 8

Initiator side dmesg from same attempt:

[862316.209664] nvme nvme1: creating 8 I/O queues.
[862316.391411] nvme nvme1: Connect command failed, error wo/DNR bit: -16402
[862316.406271] nvme nvme1: failed to connect queue: 4 ret=-18
[862317.026234] nvme nvme1: Reconnecting in 10 seconds...
[862327.049650] general protection fault: 0000 [#5] SMP PTI
[862327.061932] Modules linked in: ipmi_ssif nls_iso8859_1 intel_rapl skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass joydev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel input_leds intel_cstate intel_rapl_perf mei_me ioatdma mei lpc_ich shpchp ipmi_si ipmi_devintf ipmi_msghandler mac_hid acpi_pad acpi_power_meter sch_fq_codel ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nvmet_rdma nvmet nvme_rdma nvme_fabrics rdmavt rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_core ast igb ttm mlx5_core mlxfw drm_kms_helper dca hid_generic syscopyarea aesni_intel devlink i2c_algo_bit aes_x86_64
[862327.207417]  sysfillrect crypto_simd sysimgblt usbhid uas fb_sys_fops cryptd hid nvme ptp drm glue_helper fm10k usb_storage ahci nvme_core pps_core libahci wmi
[862327.237119] CPU: 13 PID: 25490 Comm: kworker/u305:1 Tainted: G      D          4.15.0-20-generic #21-Ubuntu
[862327.257259] Hardware name: Quanta Cloud Technology Inc. 2U4N system 20F08Axxxx/Single side, BIOS F08A2A12 10/02/2017
[862327.278963] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[862327.293033] RIP: 0010:nvme_rdma_alloc_queue+0x3c/0x190 [nvme_rdma]
[862327.306056] RSP: 0018:ffffb2370e58fe08 EFLAGS: 00010206
[862327.317177] RAX: 0000000000000000 RBX: 5d6e473123dc82a6 RCX: ffff89e9ae817c20
[862327.332105] RDX: ffffffffc057b600 RSI: ffffffffc057a3ab RDI: ffff89edaa897000
[862327.347028] RBP: ffffb2370e58fe28 R08: 000000000000024d R09: 0000000000000000
[862327.361950] R10: 0000000000000000 R11: 00000000003d0900 R12: ffff89edaa897000
[862327.376873] R13: 0000000000000000 R14: 0000000000000020 R15: 0000000000000000
[862327.391796] FS:  0000000000000000(0000) GS:ffff89e9af140000(0000) knlGS:0000000000000000
[862327.408628] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[862327.420780] CR2: 00007f685314edd0 CR3: 0000000209a0a001 CR4: 00000000007606e0
[862327.435709] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[862327.450643] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[862327.465561] PKRU: 55555554
[862327.471641] Call Trace:
[862327.477191]  nvme_rdma_configure_admin_queue+0x22/0x2d0 [nvme_rdma]
[862327.490386]  nvme_rdma_reconnect_ctrl_work+0x27/0xd0 [nvme_rdma]
[862327.503061]  process_one_work+0x1de/0x410
[862327.511741]  worker_thread+0x32/0x410
[862327.519728]  kthread+0x121/0x140
[862327.526847]  ? process_one_work+0x410/0x410
[862327.535863]  ? kthread_create_worker_on_cpu+0x70/0x70
[862327.546632]  ret_from_fork+0x35/0x40
[862327.554446] Code: 89 e5 41 56 41 55 41 54 53 48 8d 1c c5 00 00 00 00 49 89 fc 49 89 c5 49 89 d6 48 29 c3 48 c7 c2 00 b6 57 c0 48 c1 e3 04 48 03 1f <48> 89 7b 18 48 8d 7b 58 c7 43 50 00 00 00 00 e8 c0 8f d5 f0 45
[862327.593340] RIP: nvme_rdma_alloc_queue+0x3c/0x190 [nvme_rdma] RSP: ffffb2370e58fe08
[862327.609350] ---[ end trace a27f36203ed33123 ]---

Some experimenting seems to show this problem isn't in somewhat older kernels (I previously ran 4.13.9 on Ubuntu 16.10 on the same system) and I can use larger nr_io_queues values like 16 and 32 there without a problem.  Is this a known issue?  Any other thoughts?  Thanks!

Here's the JSON used to set up the target side:

{
  "hosts": [],
  "ports": [
    {
      "addr": {
        "adrfam": "ipv4",
        "traddr": "10.6.0.16",
        "treq": "not specified",
        "trsvcid": "4420",
        "trtype": "rdma"
      },
      "portid": 1,
      "referrals": [],
      "subsystems": [
        "NQN1",
        "NQN2",
        "NQN3",
        "NQN4"
      ]
    }
  ],
  "subsystems": [
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/nvme1n1"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "NQN1"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/nvme2n1"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "NQN2"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/nvme3n1"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "NQN3"
    },
    {
      "allowed_hosts": [],
      "attr": {
        "allow_any_host": "1"
      },
      "namespaces": [
        {
          "device": {
            "nguid": "00000000-0000-0000-0000-000000000000",
            "path": "/dev/nvme4n1"
          },
          "enable": 1,
          "nsid": 1
        }
      ],
      "nqn": "NQN4"
    }
  ]
}




More information about the Linux-nvme mailing list