NVMEoF oops on reset

Berck Nash Berck.Nash at wdc.com
Wed Feb 7 12:54:16 PST 2018


On 02/06/2018 06:06 PM, Max Gurtovoy wrote:
> On 2/7/2018 12:04 AM, Berck Nash wrote:
>> We're experiencing an oops whenever we issue an "nvme reset" via the
>> nvme cli on fabric setups.  Appears to be in the nvme_rdma code.  The
>> problem occurs on mainline 4.15, as well as on 4.16-nvme (commit
>> ca5554a696dce37852f6d6721520b4f13fc295c3).
> 
> please try me patches for fixing the state machine (attached).
> These should apply over nvme-4.16 but still there is a missing commit
> from Sagi the I mentioned in the cover letter. So with these 4 patches
> your test should pass...

Thanks, but that doesn't seem to be any better.  Loaded all 4 patches 
against nvme-4.16, and got a slightly different crash.  Entire log attached.

[  478.836005] general protection fault: 0000 [#1] SMP
[  478.836039] Modules linked in: mlx5_ib msdos xfs dm_mod fuse ib_iser 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nvme_rdma 
nvme_fabrics nvme nvme_core ib_umad ib_ucm rdma_ucm ib_uverbs rdma_cm 
iw_cm ib_cm qedr qede qed ib_core sr_mod ses cdrom enclosure vfat fat 
intel_rapl sb_edac ftdi_sio x86_pkg_temp_thermal intel_powerclamp sg 
coretemp mei_me ioatdma mei lpc_ich ipmi_ssif shpchp ipmi_si 
ipmi_devintf ipmi_msghandler acpi_power_meter kvm_intel acpi_pad kvm 
irqbypass nfsd auth_rpcgss nfs_acl netconsole lockd grace sunrpc ext4 
mbcache jbd2 btrfs zstd_decompress zstd_compress xxhash raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 linear uas usb_storage sd_mod ast ttm 
crct10dif_pclmul crc32_pclmul drm_kms_helper crc32c_intel syscopyarea
[  478.836519]  ghash_clmulni_intel sysfillrect pcbc sysimgblt ahci 
fb_sys_fops aesni_intel libahci mlx5_core igb crypto_simd mpt3sas 
glue_helper drm cryptd devlink dca raid_class ptp i2c_algo_bit libata 
pps_core scsi_transport_sas wmi i2c_core sha512_ssse3(E) sha512_generic(E)
[  478.836688] CPU: 1 PID: 3027 Comm: kworker/u40:3 Tainted: G 
  E    4.15.0-rc4+ #1
[  478.836742] Hardware name: Supermicro 
SSG-5028R-E1CR12L-CE010/X10SRH-CLN4F, BIOS 1.0c 10/02/2015
[  478.836804] Workqueue: nvme-reset-wq nvme_rdma_reset_ctrl_work 
[nvme_rdma]
[  478.836855] RIP: 0010:rdma_destroy_id+0x192/0x300 [rdma_cm]
[  478.836892] RSP: 0018:ffffbac5062afdf8 EFLAGS: 00010286
[  478.836929] RAX: dead000000000100 RBX: ffffa063766df6e0 RCX: 
0000000000000000
[  478.836976] RDX: dead000000000200 RSI: 000000000000000a RDI: 
ffffffffc0d4c320
[  478.837023] RBP: ffffa063571f6000 R08: 0000000000000550 R09: 
0000000000000000
[  478.838841] R10: ffffbac5062afd20 R11: 000000002b300071 R12: 
ffffffff86db29c0
[  478.840661] R13: 0000000000000020 R14: ffffa06353a39658 R15: 
ffffa06353a39660
[  478.842492] FS:  0000000000000000(0000) GS:ffffa0637f040000(0000) 
knlGS:0000000000000000
[  478.844309] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  478.846114] CR2: 00007fcc02001040 CR3: 0000001e0ec09001 CR4: 
00000000001606e0
[  478.847926] Call Trace:
[  478.849727]  nvme_rdma_alloc_queue+0x118/0x180 [nvme_rdma]
[  478.851562]  nvme_rdma_configure_admin_queue+0x1d/0x2c0 [nvme_rdma]
[  478.853388]  nvme_rdma_reset_ctrl_work+0x36/0xc0 [nvme_rdma]
[  478.855202]  process_one_work+0x198/0x370
[  478.856989]  worker_thread+0x1cd/0x390
[  478.858756]  ? process_one_work+0x370/0x370
[  478.860518]  kthread+0x111/0x130
[  478.862261]  ? kthread_create_worker_on_cpu+0x70/0x70
[  478.864024]  ret_from_fork+0x1f/0x30
[  478.865754] Code: 00 00 48 85 db 74 64 48 c7 c7 20 c3 d4 c0 4c 8b a5 
90 01 00 00 e8 2f 6e a0 c5 48 8b 85 c8 01 00 00 48 8b 95 d0 01 00 00 48 
85 c0 <48> 89 02 74 04 48 89 50 08 48 b8 00 01 00 00 00 00 ad de 48 89
[  478.869358] RIP: rdma_destroy_id+0x192/0x300 [rdma_cm] RSP: 
ffffbac5062afdf8
[  478.871197] ---[ end trace 509c59825297d9a1 ]---
[  478.873914] systemd-journald[631]: Compressed data object 809 -> 584 
using XZ
[  478.874342] Kernel panic - not syncing: Fatal exception
[  478.877826] Kernel Offset: 0x5000000 from 0xffffffff81000000 
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  478.882651] ---[ end Kernel panic - not syncing: Fatal exception
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: nvme_reset_with_patches.txt
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20180207/33ea15a1/attachment-0001.txt>


More information about the Linux-nvme mailing list