nvme-fabrics: crash at nvme connect-all

Steve Wise swise at opengridcomputing.com
Fri Jun 10 09:22:23 PDT 2016


> > Add the hack into iw_cxgb4 to force alloc_mr failures after 200 allocations
> > (or whatever value you need to make it happen).  Then on the same machine,
> > export a target device, load nvme-rdma and discover/connect to that target
> > device with nvme.  It will crash.
> >
> > Unfortunately, with the 4.7-rc2 base I'm using, I get no vmcore dump.  I'm
> > not sure why...
> >
> 
> Previously I was using Doug's rdma rxe branch + sagi's rxe fixes + rebased on nvmf-
> all.2.   To simplify, I have now gone to just straight nvmf-all.2.  Also, I separated the
> host and target to different nodes and reproduced the problem.  It’s the host side
> that is crashing.  Same GPF with RIP:
> 
> RIP: 0010:[<ffffffff810d04c3>]  [<ffffffff810d04c3>]
> get_next_timer_interrupt+0x183/0x210
> 
> Steve.

I enabled lots of kernel memory debugging and now hit this.  Perhaps a clue?  Freeing an active timer list widget?

nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.1.14:4420
nvme nvme1: creating 16 I/O queues.
nvme nvme1: Connect rejected, no private data.
nvme nvme1: rdma_resolve_addr wait failed (-104).
nvme nvme1: failed to initialize i/o queue: -104
------------[ cut here ]------------
WARNING: CPU: 1 PID: 10440 at lib/debugobjects.c:263 debug_print_object+0x8e/0xb0
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
Modules linked in: nvme_rdma nvme_fabrics rdma_ucm rdma_cm iw_cm configfs iw_cxgb4 cxgb4 ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q garp stp llc cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en ib_mthca dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt iTCO_vendor_support pcspkr mlx4_ib ib_core ipv6 mlx4_core dm_mod sg lpc_ich mfd_core i2c_i801 nvme nvme_core igb dca ptp pps_core acpi_cpufreq ext4(E) mbcache(E) jbd2(E) sd_mod(E) nouveau(E) ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) mxm_wmi(E) video(E) ahci(E) libahci(E) wmi(E) [last unloaded: cxgb4]
CPU: 1 PID: 10440 Comm: nvme Tainted: G            E   4.7.0-rc2-nvmf-all.2+ #42
Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
 0000000000000000 ffff881027a13a18 ffffffff812f032d ffffffff8130e65e
 ffff881027a13a78 ffff881027a13a78 0000000000000000 ffff881027a13a68
 ffffffff8106694d 0000031800000001 000001072aad7ce8 dead000000000200
Call Trace:
 [<ffffffff812f032d>] dump_stack+0x51/0x74
 [<ffffffff8130e65e>] ? debug_print_object+0x8e/0xb0
 [<ffffffff8106694d>] __warn+0xfd/0x120
 [<ffffffff81066a29>] warn_slowpath_fmt+0x49/0x50
 [<ffffffff81182d72>] ? kfree_const+0x22/0x30
 [<ffffffff8130e65e>] debug_print_object+0x8e/0xb0
 [<ffffffff81080850>] ? __queue_work+0x520/0x520
 [<ffffffff8130ecbe>] __debug_check_no_obj_freed+0x1ee/0x270
 [<ffffffff8130ed57>] debug_check_no_obj_freed+0x17/0x20
 [<ffffffff811c3aac>] kfree+0x9c/0x120
 [<ffffffff81182d72>] ? kfree_const+0x22/0x30
 [<ffffffff812f2f3c>] ? kobject_cleanup+0x9c/0x1b0
 [<ffffffffa04cc696>] nvme_rdma_free_ctrl+0xa6/0xc0 [nvme_rdma]
 [<ffffffffa06fcc36>] nvme_free_ctrl+0x46/0x60 [nvme_core]
 [<ffffffffa06feb2b>] nvme_put_ctrl+0x1b/0x20 [nvme_core]
 [<ffffffffa04cf1a2>] nvme_rdma_create_ctrl+0x412/0x4f0 [nvme_rdma]
 [<ffffffffa04c5d02>] nvmf_create_ctrl+0x182/0x210 [nvme_fabrics]
 [<ffffffffa04c5e3c>] nvmf_dev_write+0xac/0x110 [nvme_fabrics]
 [<ffffffff811d9c24>] __vfs_write+0x34/0x120
 [<ffffffff81002515>] ? trace_event_raw_event_sys_enter+0xb5/0x130
 [<ffffffff811d9dc9>] vfs_write+0xb9/0x130
 [<ffffffff811f9592>] ? __fdget_pos+0x12/0x50
 [<ffffffff811da9b9>] SyS_write+0x59/0xc0
 [<ffffffff81002d6d>] do_syscall_64+0x6d/0x160
 [<ffffffff81642e7c>] entry_SYSCALL64_slow_path+0x25/0x25
---[ end trace 7f80ebccfc6bd15d ]---




More information about the Linux-nvme mailing list