[PATCH] nvme-rdma: do not try to stop unallocated queues

Yi Zhang yi.zhang at redhat.com
Sun Aug 20 22:48:16 PDT 2023


Tested-by: Yi Zhang <yi.zhang at redhat.com>

Thanks Maurizio, verified the below warning issue was fixed by this patch:

[  849.714153] nvme nvme2: Reconnecting in 10 seconds...
[  859.925767] nvme nvme2: Connect rejected: status -104 (reset by remote host).
[  859.933169] nvme nvme2: rdma connection establishment failed (-104)
[  859.949151] nvme nvme2: Failed reconnect attempt 60
[  859.954682] nvme nvme2: Removing ctrl: NQN "testnqn"
[  860.071263] ------------[ cut here ]------------
[  860.075909] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[  860.075924] WARNING: CPU: 4 PID: 355 at kernel/locking/mutex.c:582
__mutex_lock+0x116b/0x1490
[  860.089432] Modules linked in: nvme_rdma nvme_tcp nvme_fabrics
sch_mqprio sch_mqprio_lib 8021q garp mrp stp llc rfkill vfat fat
opa_vnic intel_rapl_msr intel_rapl_common intel_uncore_frequency
intel_uncore_frequency_common rpcrdma isst_if_common skx_edac nfit
libnvdimm sunrpc ipmi_ssif x86_pkg_temp_thermal intel_powerclamp
coretemp rdma_ucm ib_srpt kvm_intel ib_isert ib_umad hfi1
iscsi_target_mod kvm ib_ipoib target_core_mod rdmavt mgag200 ib_iser
irqbypass iTCO_wdt rapl iTCO_vendor_support libiscsi drm_shmem_helper
intel_cstate dell_smbios scsi_transport_iscsi dcdbas mlx5_ib
intel_uncore wmi_bmof dell_wmi_descriptor drm_kms_helper i2c_algo_bit
acpi_ipmi mei_me pcspkr i2c_i801 mei lpc_ich ipmi_si i2c_smbus
intel_pch_thermal ipmi_devintf ipmi_msghandler iw_cxgb4
acpi_power_meter bnxt_re libcxgb rdma_cm ib_uverbs iw_cm ib_cm ib_core
drm fuse xfs libcrc32c sd_mod sg crct10dif_pclmul csiostor
crc32_pclmul crc32c_intel mlx5_core cxgb4 nvme ahci libahci mlxfw
nvme_core psample bnxt_en ghash_clmulni_intel nvme_common
[  860.089610]  pci_hyperv_intf libata megaraid_sas tg3 tls wmi
scsi_transport_fc t10_pi dm_mirror dm_region_hash dm_log dm_mod
[  860.190140] CPU: 4 PID: 355 Comm: kworker/u97:14 Tainted: G
W          6.5.0-rc6+ #1
[  860.198492] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS
2.13.3 12/13/2021
[  860.206066] Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
[  860.212722] RIP: 0010:__mutex_lock+0x116b/0x1490
[  860.217349] Code: 08 84 d2 0f 85 cf 02 00 00 8b 05 5c 26 f4 01 85
c0 0f 85 c9 ef ff ff 48 c7 c6 a0 39 ae b6 48 c7 c7 80 37 ae b6 e8 d5
07 bf fd <0f> 0b e9 af ef ff ff 65 48 8b 1c 25 00 3e 20 00 48 89 d8 4d
89 ef
[  860.236105] RSP: 0018:ffffc9000810faf0 EFLAGS: 00010286
[  860.241342] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  860.248479] RDX: 0000000000000002 RSI: 0000000000000004 RDI: 0000000000000001
[  860.255619] RBP: ffffc9000810fc50 R08: 0000000000000001 R09: ffffed117ad7df99
[  860.262761] R10: ffff888bd6befccb R11: 0000000000000001 R12: 0000000000000000
[  860.269901] R13: ffff888c50158210 R14: dffffc0000000000 R15: 0000000000000002
[  860.277046] FS:  0000000000000000(0000) GS:ffff888bd6a00000(0000)
knlGS:0000000000000000
[  860.285146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  860.290900] CR2: 00007fb92d0694d0 CR3: 000000091f06c003 CR4: 00000000007706e0
[  860.298042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  860.305186] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  860.312324] PKRU: 55555554
[  860.315045] Call Trace:
[  860.317499]  <TASK>
[  860.319615]  ? __warn+0xc9/0x350
[  860.322855]  ? __mutex_lock+0x116b/0x1490
[  860.326877]  ? report_bug+0x326/0x3c0
[  860.330553]  ? handle_bug+0x3c/0x70
[  860.334060]  ? exc_invalid_op+0x14/0x50
[  860.337909]  ? asm_exc_invalid_op+0x16/0x20
[  860.342119]  ? __mutex_lock+0x116b/0x1490
[  860.346142]  ? __pfx___lock_acquired+0x10/0x10
[  860.350605]  ? nvme_rdma_stop_queue+0x1b/0xa0 [nvme_rdma]
[  860.356020]  ? find_held_lock+0x33/0x120
[  860.359957]  ? local_clock_noinstr+0x9/0xc0
[  860.364150]  ? __pfx___mutex_lock+0x10/0x10
[  860.368350]  ? __pfx___lock_release+0x10/0x10
[  860.372728]  ? down_read+0xbe/0x4c0
[  860.376232]  ? __up_read+0x1fe/0x760
[  860.379818]  ? __kmem_cache_free+0xc2/0x2c0
[  860.384015]  ? __pfx___up_read+0x10/0x10
[  860.387953]  ? nvme_rdma_stop_queue+0x1b/0xa0 [nvme_rdma]
[  860.393363]  nvme_rdma_stop_queue+0x1b/0xa0 [nvme_rdma]
[  860.398598]  nvme_rdma_teardown_io_queues.part.0+0xc6/0x210 [nvme_rdma]
[  860.405223]  nvme_rdma_delete_ctrl+0x59/0x110 [nvme_rdma]
[  860.410639]  nvme_do_delete_ctrl+0x14b/0x230 [nvme_core]
[  860.415976]  process_one_work+0x952/0x1660
[  860.420088]  ? __lock_acquired+0x207/0x7b0
[  860.424199]  ? __pfx_process_one_work+0x10/0x10
[  860.428742]  ? __pfx___lock_acquired+0x10/0x10
[  860.433203]  ? worker_thread+0x15a/0xef0
[  860.437140]  worker_thread+0x5be/0xef0
[  860.440903]  ? __pfx_worker_thread+0x10/0x10
[  860.445180]  kthread+0x2f1/0x3d0
[  860.448420]  ? __pfx_kthread+0x10/0x10
[  860.452184]  ret_from_fork+0x2d/0x70
[  860.455772]  ? __pfx_kthread+0x10/0x10
[  860.459532]  ret_from_fork_asm+0x1b/0x30
[  860.463478]  </TASK>
[  860.465678] irq event stamp: 769087
[  860.469178] hardirqs last  enabled at (769087):
[<ffffffffb5202c31>] __free_object+0x611/0xcf0
[  860.477790] hardirqs last disabled at (769086):
[<ffffffffb5202d92>] __free_object+0x772/0xcf0
[  860.486407] softirqs last  enabled at (768672):
[<ffffffffb66571ab>] __do_softirq+0x5db/0x8f6
[  860.494933] softirqs last disabled at (768665):
[<ffffffffb424b63c>] __irq_exit_rcu+0xbc/0x210
[  860.503551] ---[ end trace 0000000000000000 ]---
[  860.540254] nvme nvme2: Property Set error: 880, offset 0x14



On Mon, Jul 31, 2023 at 6:43 PM Maurizio Lombardi <mlombard at redhat.com> wrote:
>
> Trying to stop a queue which hasn't been allocated will result
> in a warning due to calling mutex_lock() against an uninitialized mutex.
>
>  DEBUG_LOCKS_WARN_ON(lock->magic != lock)
>  WARNING: CPU: 4 PID: 104150 at kernel/locking/mutex.c:579
>
>  Call trace:
>   RIP: 0010:__mutex_lock+0x1173/0x14a0
>   nvme_rdma_stop_queue+0x1b/0xa0 [nvme_rdma]
>   nvme_rdma_teardown_io_queues.part.0+0xb0/0x1d0 [nvme_rdma]
>   nvme_rdma_delete_ctrl+0x50/0x100 [nvme_rdma]
>   nvme_do_delete_ctrl+0x149/0x158 [nvme_core]
>
> Signed-off-by: Maurizio Lombardi <mlombard at redhat.com>
> ---
>  drivers/nvme/host/rdma.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index d433b2ec07a6..00b13336125e 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -638,6 +638,9 @@ static void __nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
>
>  static void nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
>  {
> +       if (!test_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
> +               return;
> +
>         mutex_lock(&queue->queue_lock);
>         if (test_and_clear_bit(NVME_RDMA_Q_LIVE, &queue->flags))
>                 __nvme_rdma_stop_queue(queue);
> --
> 2.39.3
>
>


-- 
Best Regards,
  Yi Zhang




More information about the Linux-nvme mailing list