[PATCH] nvme-rdma: do not try to stop unallocated queues
Yi Zhang
yi.zhang at redhat.com
Sun Aug 20 22:48:16 PDT 2023
Tested-by: Yi Zhang <yi.zhang at redhat.com>
Thanks Maurizio, verified the below warning issue was fixed by this patch:
[ 849.714153] nvme nvme2: Reconnecting in 10 seconds...
[ 859.925767] nvme nvme2: Connect rejected: status -104 (reset by remote host).
[ 859.933169] nvme nvme2: rdma connection establishment failed (-104)
[ 859.949151] nvme nvme2: Failed reconnect attempt 60
[ 859.954682] nvme nvme2: Removing ctrl: NQN "testnqn"
[ 860.071263] ------------[ cut here ]------------
[ 860.075909] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ 860.075924] WARNING: CPU: 4 PID: 355 at kernel/locking/mutex.c:582
__mutex_lock+0x116b/0x1490
[ 860.089432] Modules linked in: nvme_rdma nvme_tcp nvme_fabrics
sch_mqprio sch_mqprio_lib 8021q garp mrp stp llc rfkill vfat fat
opa_vnic intel_rapl_msr intel_rapl_common intel_uncore_frequency
intel_uncore_frequency_common rpcrdma isst_if_common skx_edac nfit
libnvdimm sunrpc ipmi_ssif x86_pkg_temp_thermal intel_powerclamp
coretemp rdma_ucm ib_srpt kvm_intel ib_isert ib_umad hfi1
iscsi_target_mod kvm ib_ipoib target_core_mod rdmavt mgag200 ib_iser
irqbypass iTCO_wdt rapl iTCO_vendor_support libiscsi drm_shmem_helper
intel_cstate dell_smbios scsi_transport_iscsi dcdbas mlx5_ib
intel_uncore wmi_bmof dell_wmi_descriptor drm_kms_helper i2c_algo_bit
acpi_ipmi mei_me pcspkr i2c_i801 mei lpc_ich ipmi_si i2c_smbus
intel_pch_thermal ipmi_devintf ipmi_msghandler iw_cxgb4
acpi_power_meter bnxt_re libcxgb rdma_cm ib_uverbs iw_cm ib_cm ib_core
drm fuse xfs libcrc32c sd_mod sg crct10dif_pclmul csiostor
crc32_pclmul crc32c_intel mlx5_core cxgb4 nvme ahci libahci mlxfw
nvme_core psample bnxt_en ghash_clmulni_intel nvme_common
[ 860.089610] pci_hyperv_intf libata megaraid_sas tg3 tls wmi
scsi_transport_fc t10_pi dm_mirror dm_region_hash dm_log dm_mod
[ 860.190140] CPU: 4 PID: 355 Comm: kworker/u97:14 Tainted: G
W 6.5.0-rc6+ #1
[ 860.198492] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS
2.13.3 12/13/2021
[ 860.206066] Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
[ 860.212722] RIP: 0010:__mutex_lock+0x116b/0x1490
[ 860.217349] Code: 08 84 d2 0f 85 cf 02 00 00 8b 05 5c 26 f4 01 85
c0 0f 85 c9 ef ff ff 48 c7 c6 a0 39 ae b6 48 c7 c7 80 37 ae b6 e8 d5
07 bf fd <0f> 0b e9 af ef ff ff 65 48 8b 1c 25 00 3e 20 00 48 89 d8 4d
89 ef
[ 860.236105] RSP: 0018:ffffc9000810faf0 EFLAGS: 00010286
[ 860.241342] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 860.248479] RDX: 0000000000000002 RSI: 0000000000000004 RDI: 0000000000000001
[ 860.255619] RBP: ffffc9000810fc50 R08: 0000000000000001 R09: ffffed117ad7df99
[ 860.262761] R10: ffff888bd6befccb R11: 0000000000000001 R12: 0000000000000000
[ 860.269901] R13: ffff888c50158210 R14: dffffc0000000000 R15: 0000000000000002
[ 860.277046] FS: 0000000000000000(0000) GS:ffff888bd6a00000(0000)
knlGS:0000000000000000
[ 860.285146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 860.290900] CR2: 00007fb92d0694d0 CR3: 000000091f06c003 CR4: 00000000007706e0
[ 860.298042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 860.305186] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 860.312324] PKRU: 55555554
[ 860.315045] Call Trace:
[ 860.317499] <TASK>
[ 860.319615] ? __warn+0xc9/0x350
[ 860.322855] ? __mutex_lock+0x116b/0x1490
[ 860.326877] ? report_bug+0x326/0x3c0
[ 860.330553] ? handle_bug+0x3c/0x70
[ 860.334060] ? exc_invalid_op+0x14/0x50
[ 860.337909] ? asm_exc_invalid_op+0x16/0x20
[ 860.342119] ? __mutex_lock+0x116b/0x1490
[ 860.346142] ? __pfx___lock_acquired+0x10/0x10
[ 860.350605] ? nvme_rdma_stop_queue+0x1b/0xa0 [nvme_rdma]
[ 860.356020] ? find_held_lock+0x33/0x120
[ 860.359957] ? local_clock_noinstr+0x9/0xc0
[ 860.364150] ? __pfx___mutex_lock+0x10/0x10
[ 860.368350] ? __pfx___lock_release+0x10/0x10
[ 860.372728] ? down_read+0xbe/0x4c0
[ 860.376232] ? __up_read+0x1fe/0x760
[ 860.379818] ? __kmem_cache_free+0xc2/0x2c0
[ 860.384015] ? __pfx___up_read+0x10/0x10
[ 860.387953] ? nvme_rdma_stop_queue+0x1b/0xa0 [nvme_rdma]
[ 860.393363] nvme_rdma_stop_queue+0x1b/0xa0 [nvme_rdma]
[ 860.398598] nvme_rdma_teardown_io_queues.part.0+0xc6/0x210 [nvme_rdma]
[ 860.405223] nvme_rdma_delete_ctrl+0x59/0x110 [nvme_rdma]
[ 860.410639] nvme_do_delete_ctrl+0x14b/0x230 [nvme_core]
[ 860.415976] process_one_work+0x952/0x1660
[ 860.420088] ? __lock_acquired+0x207/0x7b0
[ 860.424199] ? __pfx_process_one_work+0x10/0x10
[ 860.428742] ? __pfx___lock_acquired+0x10/0x10
[ 860.433203] ? worker_thread+0x15a/0xef0
[ 860.437140] worker_thread+0x5be/0xef0
[ 860.440903] ? __pfx_worker_thread+0x10/0x10
[ 860.445180] kthread+0x2f1/0x3d0
[ 860.448420] ? __pfx_kthread+0x10/0x10
[ 860.452184] ret_from_fork+0x2d/0x70
[ 860.455772] ? __pfx_kthread+0x10/0x10
[ 860.459532] ret_from_fork_asm+0x1b/0x30
[ 860.463478] </TASK>
[ 860.465678] irq event stamp: 769087
[ 860.469178] hardirqs last enabled at (769087):
[<ffffffffb5202c31>] __free_object+0x611/0xcf0
[ 860.477790] hardirqs last disabled at (769086):
[<ffffffffb5202d92>] __free_object+0x772/0xcf0
[ 860.486407] softirqs last enabled at (768672):
[<ffffffffb66571ab>] __do_softirq+0x5db/0x8f6
[ 860.494933] softirqs last disabled at (768665):
[<ffffffffb424b63c>] __irq_exit_rcu+0xbc/0x210
[ 860.503551] ---[ end trace 0000000000000000 ]---
[ 860.540254] nvme nvme2: Property Set error: 880, offset 0x14
On Mon, Jul 31, 2023 at 6:43 PM Maurizio Lombardi <mlombard at redhat.com> wrote:
>
> Trying to stop a queue which hasn't been allocated will result
> in a warning due to calling mutex_lock() against an uninitialized mutex.
>
> DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> WARNING: CPU: 4 PID: 104150 at kernel/locking/mutex.c:579
>
> Call trace:
> RIP: 0010:__mutex_lock+0x1173/0x14a0
> nvme_rdma_stop_queue+0x1b/0xa0 [nvme_rdma]
> nvme_rdma_teardown_io_queues.part.0+0xb0/0x1d0 [nvme_rdma]
> nvme_rdma_delete_ctrl+0x50/0x100 [nvme_rdma]
> nvme_do_delete_ctrl+0x149/0x158 [nvme_core]
>
> Signed-off-by: Maurizio Lombardi <mlombard at redhat.com>
> ---
> drivers/nvme/host/rdma.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index d433b2ec07a6..00b13336125e 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -638,6 +638,9 @@ static void __nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
>
> static void nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
> {
> + if (!test_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
> + return;
> +
> mutex_lock(&queue->queue_lock);
> if (test_and_clear_bit(NVME_RDMA_Q_LIVE, &queue->flags))
> __nvme_rdma_stop_queue(queue);
> --
> 2.39.3
>
>
--
Best Regards,
Yi Zhang
More information about the Linux-nvme
mailing list