nvmeof rdma regression issue on 4.14.0-rc1
Yi Zhang
yizhan at redhat.com
Thu Sep 21 05:47:31 PDT 2017
Hi
With below steps, it's easy to reproduce this issue on host side, I cannot reproduce it on 4.13.
So I guess it's one regression issue on 4.14.0-rc1, let me know if you need more info.
host side:
#connect the target
target side:
#nvmetcli clear
#sleep 90
#nvmetcli restore /etc/rdma.json
Host side log:
[ 184.318537] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.0.90:4420
[ 184.502114] nvme nvme0: creating 40 I/O queues.
[ 185.017751] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.0.90:4420
[ 191.303659] nvme nvme0: rescanning
[ 252.226387] nvme nvme0: Reconnecting in 10 seconds...
[ 262.389578] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 262.397084] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 262.403514] nvme nvme0: Failed reconnect attempt 1
[ 262.408875] nvme nvme0: Reconnecting in 10 seconds...
[ 272.616908] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 272.624408] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 272.630839] nvme nvme0: Failed reconnect attempt 2
[ 272.636230] nvme nvme0: Reconnecting in 10 seconds...
[ 282.856896] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 282.864398] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 282.870831] nvme nvme0: Failed reconnect attempt 3
[ 282.876232] nvme nvme0: Reconnecting in 10 seconds...
[ 293.116196] nvme nvme0: creating 40 I/O queues.
[ 293.209662] DMAR: ERROR: DMA PTE for vPFN 0xe0f59 already set (to 10369a9001 not 10115ed001)
[ 293.219117] ------------[ cut here ]------------
[ 293.224284] WARNING: CPU: 14 PID: 751 at drivers/iommu/intel-iommu.c:2305 __domain_mapping+0x367/0x380
[ 293.234698] Modules linked in: nvme_rdma nvme_fabrics nvme_core sch_mqprio ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge 8021q garp mrp stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core intel_rapl ipmi_ssif sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore iTCO_wdt ipmi_si intel_rapl_perf iTCO_vendor_support ipmi_devintf dcdbas sg pcspkr ipmi_msghandler ioatdma mei_me mei dca shpchp lpc_ich acpi_pad acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c mlx4_en sd_mod
[ 293.313884] mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core tg3 ahci libahci ptp libata i2c_core crc32c_intel devlink pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 293.335583] CPU: 14 PID: 751 Comm: kworker/u369:7 Not tainted 4.14.0-rc1 #2
[ 293.343374] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.6.2 01/08/2016
[ 293.351750] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[ 293.359249] task: ffff881032ecdd00 task.stack: ffffc900084d8000
[ 293.365873] RIP: 0010:__domain_mapping+0x367/0x380
[ 293.371230] RSP: 0018:ffffc900084dbc60 EFLAGS: 00010202
[ 293.377075] RAX: 0000000000000004 RBX: 00000010115ed001 RCX: 0000000000000000
[ 293.385056] RDX: 0000000000000000 RSI: ffff88103e7ce038 RDI: ffff88103e7ce038
[ 293.393040] RBP: ffffc900084dbcc0 R08: 0000000000000000 R09: 0000000000000000
[ 293.401024] R10: 00000000000002f7 R11: 00000000010115ed R12: ffff88103b9e1ac8
[ 293.409744] R13: 0000000000000001 R14: 0000000000000001 R15: 00000000000e0f59
[ 293.418456] FS: 0000000000000000(0000) GS:ffff88103e7c0000(0000) knlGS:0000000000000000
[ 293.428229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 293.435391] CR2: 0000154ecabc9140 CR3: 0000001005709001 CR4: 00000000001606e0
[ 293.444112] Call Trace:
[ 293.447594] __intel_map_single+0xeb/0x180
[ 293.452918] intel_map_page+0x39/0x40
[ 293.457765] mlx4_ib_alloc_mr+0x141/0x220 [mlx4_ib]
[ 293.463965] ib_alloc_mr+0x26/0x50 [ib_core]
[ 293.469471] nvme_rdma_reinit_request+0x3a/0x70 [nvme_rdma]
[ 293.476433] ? nvme_rdma_free_ctrl+0xb0/0xb0 [nvme_rdma]
[ 293.483100] blk_mq_reinit_tagset+0x5c/0x90
[ 293.488508] nvme_rdma_configure_io_queues+0x211/0x290 [nvme_rdma]
[ 293.496152] nvme_rdma_reconnect_ctrl_work+0x5b/0xd0 [nvme_rdma]
[ 293.503598] process_one_work+0x149/0x360
[ 293.508815] worker_thread+0x4d/0x3c0
[ 293.513638] kthread+0x109/0x140
[ 293.517973] ? rescuer_thread+0x380/0x380
[ 293.523176] ? kthread_park+0x60/0x60
[ 293.527993] ret_from_fork+0x25/0x30
[ 293.532705] Code: b2 aa 81 4c 89 5d a0 4c 89 4d a8 e8 87 e1 c0 ff 8b 05 fe 6e 87 00 4c 8b 4d a8 4c 8b 5d a0 85 c0 74 09 83 e8 01 89 05 e9 6e 87 00 <0f> ff e9 b8 fd ff ff e8 8d c7 ba ff 0f 1f 00 66 2e 0f 1f 84 00
[ 293.555293] ---[ end trace c324b9449c77b573 ]---
[ 293.562133] DMAR: ERROR: DMA PTE for vPFN 0xe0f59 already set (to 10369a9001 not 1002790001)
[ 293.572343] ------------[ cut here ]------------
[ 293.578288] WARNING: CPU: 14 PID: 751 at drivers/iommu/intel-iommu.c:2305 __domain_mapping+0x367/0x380
[ 293.589473] Modules linked in: nvme_rdma nvme_fabrics nvme_core sch_mqprio ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge 8021q garp mrp stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core intel_rapl ipmi_ssif sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore iTCO_wdt ipmi_si intel_rapl_perf iTCO_vendor_support ipmi_devintf dcdbas sg pcspkr ipmi_msghandler ioatdma mei_me mei dca shpchp lpc_ich acpi_pad acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c mlx4_en sd_mod
[ 293.674124] mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core tg3 ahci libahci ptp libata i2c_core crc32c_intel devlink pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 293.697394] CPU: 14 PID: 751 Comm: kworker/u369:7 Tainted: G W 4.14.0-rc1 #2
[ 293.707322] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.6.2 01/08/2016
[ 293.716456] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[ 293.724706] task: ffff881032ecdd00 task.stack: ffffc900084d8000
[ 293.732095] RIP: 0010:__domain_mapping+0x367/0x380
[ 293.738214] RSP: 0018:ffffc900084dbc60 EFLAGS: 00010206
[ 293.744810] RAX: 0000000000000003 RBX: 0000001002790001 RCX: 0000000000000000
[ 293.753541] RDX: 0000000000000000 RSI: ffff88103e7ce038 RDI: ffff88103e7ce038
[ 293.762280] RBP: ffffc900084dbcc0 R08: 0000000000000000 R09: 0000000000000000
[ 293.771011] R10: 00000000000003ff R11: 0000000001002790 R12: ffff88103b9e1ac8
[ 293.779752] R13: 0000000000000001 R14: 0000000000000001 R15: 00000000000e0f59
[ 293.788487] FS: 0000000000000000(0000) GS:ffff88103e7c0000(0000) knlGS:0000000000000000
[ 293.798295] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 293.805480] CR2: 0000154ecabc9140 CR3: 0000001005709001 CR4: 00000000001606e0
[ 293.814226] Call Trace:
[ 293.817726] __intel_map_single+0xeb/0x180
[ 293.823058] intel_map_page+0x39/0x40
[ 293.827906] mlx4_ib_alloc_mr+0x141/0x220 [mlx4_ib]
[ 293.834112] ib_alloc_mr+0x26/0x50 [ib_core]
[ 293.839640] nvme_rdma_reinit_request+0x3a/0x70 [nvme_rdma]
[ 293.846628] ? nvme_rdma_free_ctrl+0xb0/0xb0 [nvme_rdma]
[ 293.853322] blk_mq_reinit_tagset+0x5c/0x90
[ 293.858757] nvme_rdma_configure_io_queues+0x211/0x290 [nvme_rdma]
[ 293.866429] nvme_rdma_reconnect_ctrl_work+0x5b/0xd0 [nvme_rdma]
[ 293.873891] process_one_work+0x149/0x360
[ 293.879132] worker_thread+0x4d/0x3c0
[ 293.883979] kthread+0x109/0x140
[ 293.888336] ? rescuer_thread+0x380/0x380
[ 293.893554] ? kthread_park+0x60/0x60
[ 293.898399] ret_from_fork+0x25/0x30
[ 293.903139] Code: b2 aa 81 4c 89 5d a0 4c 89 4d a8 e8 87 e1 c0 ff 8b 05 fe 6e 87 00 4c 8b 4d a8 4c 8b 5d a0 85 c0 74 09 83 e8 01 89 05 e9 6e 87 00 <0f> ff e9 b8 fd ff ff e8 8d c7 ba ff 0f 1f 00 66 2e 0f 1f 84 00
[ 293.925798] ---[ end trace c324b9449c77b574 ]---
[ 294.460186] nvme nvme0: Successfully reconnected
[ 294.460243] nvme nvme0: MEMREG for CQE 0xffff88100663a640 failed with status memory management operation error (6)
[ 325.850105] general protection fault: 0000 [#1] SMP
[ 325.856354] Modules linked in: nvme_rdma nvme_fabrics nvme_core sch_mqprio ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge 8021q garp mrp stp llc rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core intel_rapl ipmi_ssif sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore iTCO_wdt ipmi_si intel_rapl_perf iTCO_vendor_support ipmi_devintf dcdbas sg pcspkr ipmi_msghandler ioatdma mei_me mei dca shpchp lpc_ich acpi_pad acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c mlx4_en sd_mod
[ 325.940982] mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core tg3 ahci libahci ptp libata i2c_core crc32c_intel devlink pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 325.964245] CPU: 22 PID: 3707 Comm: git Tainted: G W 4.14.0-rc1 #2
[ 325.973198] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.6.2 01/08/2016
[ 325.982332] task: ffff881036451740 task.stack: ffffc90009368000
[ 325.989746] RIP: 0010:xfs_attr_shortform_getvalue+0x25/0x120 [xfs]
[ 325.997412] RSP: 0018:ffffc9000936b920 EFLAGS: 00010282
[ 326.004006] RAX: 63656a626f3a755f RBX: ffffc9000936b980 RCX: ffffc9000936b980
[ 326.012738] RDX: 0000000001000000 RSI: ffffc9000936b980 RDI: ffffc9000936b980
[ 326.021465] RBP: ffffc9000936b958 R08: ffffffff81a740c0 R09: 0000000000000000
[ 326.030194] R10: 0000000000000008 R11: ffff881002f52700 R12: ffffc9000936ba54
[ 326.038913] R13: ffff882023b9fc00 R14: 0000000000000000 R15: 0000000000000008
[ 326.047642] FS: 000014c3de00e740(0000) GS:ffff88103e8c0000(0000) knlGS:0000000000000000
[ 326.057445] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 326.064627] CR2: 00000000007b7d44 CR3: 000000018b0cb005 CR4: 00000000001606e0
[ 326.073370] Call Trace:
[ 326.076885] xfs_attr_get_ilocked+0x63/0x70 [xfs]
[ 326.082917] xfs_attr_get+0xca/0x120 [xfs]
[ 326.088269] xfs_xattr_get+0x4c/0x70 [xfs]
[ 326.093592] __vfs_getxattr+0x57/0x70
[ 326.098428] inode_doinit_with_dentry+0x33c/0x580
[ 326.104430] selinux_d_instantiate+0x1c/0x20
[ 326.109947] security_d_instantiate+0x32/0x50
[ 326.115558] d_splice_alias+0x4c/0x370
[ 326.120505] xfs_vn_lookup+0x87/0xb0 [xfs]
[ 326.125819] lookup_slow+0xa2/0x160
[ 326.130448] walk_component+0x160/0x250
[ 326.135461] ? legitimize_path.isra.33+0x2e/0x60
[ 326.141346] path_lookupat+0x79/0x210
[ 326.146162] ? cpumask_any_but+0x31/0x40
[ 326.151267] filename_lookup+0xaf/0x190
[ 326.156276] ? kmem_cache_alloc+0x9c/0x1b0
[ 326.161576] ? getname_flags+0x4f/0x1f0
[ 326.166583] ? getname_flags+0x6f/0x1f0
[ 326.171586] user_path_at_empty+0x36/0x40
[ 326.176768] vfs_statx+0x77/0xe0
[ 326.181058] SYSC_newlstat+0x3d/0x70
[ 326.185726] ? __audit_syscall_entry+0xaf/0x100
[ 326.191463] ? syscall_trace_enter+0x1d0/0x2b0
[ 326.197090] ? __audit_syscall_exit+0x209/0x290
[ 326.202799] SyS_newlstat+0xe/0x10
[ 326.207226] do_syscall_64+0x67/0x180
[ 326.211926] entry_SYSCALL64_slow_path+0x25/0x25
[ 326.217674] RIP: 0033:0x14c3dd4961a5
[ 326.222237] RSP: 002b:00007fff730b3478 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[ 326.231250] RAX: ffffffffffffffda RBX: 0000000000ca3c50 RCX: 000014c3dd4961a5
[ 326.239774] RDX: 00007fff730b3530 RSI: 00007fff730b3530 RDI: 0000000000ca3ca0
[ 326.248281] RBP: 0000000000ca3ca0 R08: 0000000000000000 R09: 0000000000000000
[ 326.256770] R10: 0000000000000001 R11: 0000000000000246 R12: 00007fff730b3530
[ 326.265239] R13: 0000000000000000 R14: 0000000000000170 R15: 00007fff730b3530
[ 326.273684] Code: 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f9 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 10 48 8b 47 38 48 8b 40 38 48 8b 40 18 <44> 0f b6 70 02 48 8d 58 04 b8 c3 ff ff ff 45 85 f6 0f 84 82 00
[ 326.295785] RIP: xfs_attr_shortform_getvalue+0x25/0x120 [xfs] RSP: ffffc9000936b920
[ 326.304910] ---[ end trace c324b9449c77b575 ]---
[ 326.362863] Kernel panic - not syncing: Fatal exception
[ 326.362964] Kernel Offset: disabled
[ 326.427058] ---[ end Kernel panic - not syncing: Fatal exception
Best Regards,
Yi Zhang
More information about the Linux-nvme
mailing list