nvmet_rdma crash - DISCONNECT event with NULL queue
Steve Wise
swise at opengridcomputing.com
Wed Nov 2 12:18:27 PDT 2016
> I'll also try and reproduce this on mlx4 to rule out
> iwarp and cxgb4 anomolies.
Running the same test over mlx4/roce, I hit a warning in list_debug, and then a
stuck CPU...
I see this a few times:
[ 916.207157] ------------[ cut here ]------------
[ 916.212455] WARNING: CPU: 1 PID: 5553 at lib/list_debug.c:33
__list_add+0xbe/0xd0
[ 916.220670] list_add corruption. prev->next should be next
(ffffffffa0847070), but was (null). (prev=ffff880833baaf20).
[ 916.233852] Modules linked in: iw_cxgb4 cxgb4 nvmet_rdma nvmet null_blk brd
ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_dfrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM
iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc
ipmi_devintf cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverb
ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib
mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror
dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvmirqbypass uinput
iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg lpc_ich
mfd_core mei_me mei nvme nvme_core igb dca ptp pps_core ipmi_si ipmi_msghandler
wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E)ahci(E) libahci(E) libata(E) mgag200(E)
ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E)
syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded: cxgb4]
[ 916.337427] CPU: 1 PID: 5553 Comm: kworker/1:15 Tainted: G E
4.8.0+ #131
[ 916.346192] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[ 916.354126] Workqueue: ib_cm cm_work_handler [ib_cm]
[ 916.360096] 0000000000000000 ffff880817483968 ffffffff8135a817
ffffffff8137813e
[ 916.368594] ffff8808174839c8 ffff8808174839c8 0000000000000000
ffff8808174839b8
[ 916.377112] ffffffff81086dad 000000f002080020 0000002134f11400
ffff880834f11470
[ 916.385642] Call Trace:
[ 916.389181] [<ffffffff8135a817>] dump_stack+0x67/0x90
[ 916.395430] [<ffffffff8137813e>] ? __list_add+0xbe/0xd0
[ 916.401863] [<ffffffff81086dad>] __warn+0xfd/0x120
[ 916.407862] [<ffffffff81086e89>] warn_slowpath_fmt+0x49/0x50
[ 916.414741] [<ffffffff8137813e>] __list_add+0xbe/0xd0
[ 916.421034] [<ffffffff816e0be6>] ? mutex_lock+0x16/0x40
[ 916.427522] [<ffffffffa0844d40>] nvmet_rdma_queue_connect+0x110/0x1a0
[nvmet_rdma]
[ 916.436374] [<ffffffffa0845430>] nvmet_rdma_cm_handler+0x100/0x1b0
[nvmet_rdma]
[ 916.444998] [<ffffffffa072e1d0>] cma_req_handler+0x200/0x300 [rdma_cm]
[ 916.452847] [<ffffffffa06f3937>] cm_process_work+0x27/0x100 [ib_cm]
[ 916.460452] [<ffffffffa06f61ea>] cm_req_handler+0x35a/0x540 [ib_cm]
[ 916.468070] [<ffffffffa06f641b>] cm_work_handler+0x4b/0xd0 [ib_cm]
[ 916.475614] [<ffffffff810a1483>] process_one_work+0x183/0x4d0
[ 916.482751] [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[ 916.489539] [<ffffffff816df260>] ? schedule+0x40/0xb0
[ 916.495985] [<ffffffff810a211d>] worker_thread+0x16d/0x530
[ 916.502892] [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[ 916.509730] [<ffffffff810cb9b6>] ? __wake_up_common+0x56/0x90
[ 916.516926] [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[ 916.524568] [<ffffffff816df260>] ? schedule+0x40/0xb0
[ 916.531084] [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[ 916.538758] [<ffffffff810a6c5c>] kthread+0xcc/0xf0
[ 916.545053] [<ffffffff810b1aae>] ? schedule_tail+0x1e/0xc0
[ 916.552082] [<ffffffff816e2eff>] ret_from_fork+0x1f/0x40
[ 916.558935] [<ffffffff810a6b90>] ? kthread_freezable_should_stop+0x70/0x70
[ 916.567430] ---[ end trace a294c05aa08938f6 ]---
...
And then a cpu gets stuck:
[ 988.672768] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
[kworker/1:12:5549]
[ 988.681814] Modules linked in: iw_cxgb4 cxgb4 nvmet_rdma nvmet null_blk brd
ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_dfrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM
iptable_mangle iptable_filter ip_tables bridge 8021q mrp garp stp llc
ipmi_devintf cachefiles fscache rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverb
ib_umad ocrdma be2net iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx5_ib
mlx5_core mlx4_ib mlx4_en mlx4_core ib_mthca ib_core binfmt_misc dm_mirror
dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvmirqbypass uinput
iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr dm_mod i2c_i801 i2c_smbus sg lpc_ich
mfd_core mei_me mei nvme nvme_core igb dca ptp pps_core ipmi_si ipmi_msghandler
wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E)ahci(E) libahci(E) libata(E) mgag200(E)
ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E)
syscopyarea(E) i2c_algo_bit(E) i2c_core(E) [last unloaded: cxgb4]
[ 988.786988] CPU: 1 PID: 5549 Comm: kworker/1:12 Tainted: G W EL
4.8.0+ #131
[ 988.796023] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[ 988.804188] Workqueue: events nvmet_keep_alive_timer [nvmet]
[ 988.811068] task: ffff880819328000 task.stack: ffff880819324000
[ 988.818195] RIP: 0010:[<ffffffffa084361c>] [<ffffffffa084361c>]
nvmet_rdma_delete_ctrl+0x3c/0xb0 [nvmet_rdma]
[ 988.829434] RSP: 0018:ffff880819327c58 EFLAGS: 00000287
[ 988.835946] RAX: ffff880834f11b20 RBX: ffff880834f11b20 RCX: 0000000000000000
[ 988.844285] RDX: 0000000000000001 RSI: ffff88085fa58ae0 RDI: ffffffffa0847040
[ 988.852626] RBP: ffff880819327c88 R08: ffff88085fa58ae0 R09: ffff880819327918
[ 988.860968] R10: 0000000000000920 R11: 0000000000000001 R12: ffff880834f11a00
[ 988.869310] R13: ffff88081a6a4800 R14: 0000000000000000 R15: ffff88085fa5d505
[ 988.877655] FS: 0000000000000000(0000) GS:ffff88085fa40000(0000)
knlGS:0000000000000000
[ 988.886955] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 988.893906] CR2: 00007f28fcc6e74b CR3: 0000000001c06000 CR4: 00000000000406e0
[ 988.902246] Stack:
[ 988.905457] ffff880817fc6720 0000000000000002 000000000000000f
ffff88081a6a4800
[ 988.914142] ffff88085fa58ac0 ffff88085fa5d500 ffff880819327ca8
ffffffffa0830237
[ 988.922825] ffff88085fa58ac0 ffff8808584ce900 ffff880819327d88
ffffffff810a1483
[ 988.931507] Call Trace:
[ 988.935152] [<ffffffffa0830237>] nvmet_keep_alive_timer+0x37/0x40 [nvmet]
[ 988.943232] [<ffffffff810a1483>] process_one_work+0x183/0x4d0
[ 988.950273] [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[ 988.956963] [<ffffffff816df260>] ? schedule+0x40/0xb0
[ 988.963299] [<ffffffff8102eb34>] ? __switch_to+0x1e4/0x790
[ 988.970070] [<ffffffff810a211d>] worker_thread+0x16d/0x530
[ 988.976848] [<ffffffff816deda0>] ? __schedule+0x1f0/0x5b0
[ 988.983541] [<ffffffff810cb9b6>] ? __wake_up_common+0x56/0x90
[ 988.990578] [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[ 988.998055] [<ffffffff816df260>] ? schedule+0x40/0xb0
[ 989.004394] [<ffffffff810a1fb0>] ? maybe_create_worker+0x120/0x120
[ 989.011861] [<ffffffff810a6c5c>] kthread+0xcc/0xf0
[ 989.017944] [<ffffffff810b1aae>] ? schedule_tail+0x1e/0xc0
[ 989.024728] [<ffffffff816e2eff>] ret_from_fork+0x1f/0x40
[ 989.031325] [<ffffffff810a6b90>] ? kthread_freezable_should_stop+0x70/0x70
[ 989.039488] Code: 90 49 89 fd 48 c7 c7 40 70 84 a0 e8 cf d5 e9 e0 48 8b 05 68
3a 00 00 48 3d 70 70 84 a0 4c 8d a0 e0 fe ff ff 48 89 c3 75 1c eb 55 <49> 8b 84
24 20 01 00 00 48 3d 70 70 84 a0 4c 8d a0 e0 fe ff ff
More information about the Linux-nvme
mailing list