NVMe issues with NVMe rescan/reset/remove operation

Yi Zhang yizhan at redhat.com
Sun Feb 19 22:33:56 PST 2017


Hi 

I found several issues during NVMe rescan/reset/remove with IO on 4.10.0-rc8, could you help check it, thanks.

Steps I used:
#fio -filename=/dev/nvme0n1p1 -iodepth=1 -thread -rw=randwrite -ioengine=psync -bssplit=5k/10:9k/10:13k/10:17k/10:21k/10:25k/10:29k/10:33k/10:37k/10:41k/10 -bs_unaligned -runtime=1200 -size=-group_reporting -name=mytest -numjobs=60 &
#lspci | grep -i nvme
84:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 172X (rev 01)
#sleep 35
#echo 1 > /sys/bus/pci/devices/0000:84:00.0/rescan
#echo 1 > /sys/bus/pci/devices/0000:84:00.0/reset
#echo 1 > /sys/bus/pci/devices/0000:84:00.0/remove

1. kernel BUG at block/blk-mq.c:374!
   Full log: http://pastebin.com/fymFAxjP

[  129.974989] kernel BUG at block/blk-mq.c:374!
[  129.979849] invalid opcode: 0000 [#1] SMP
[  129.984318] Modules linked in: ipmi_ssif vfat fat intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt iTCO_vendor_support intel_cstate mei_me mei intel_uncore mxm_wmi ipmi_si dcdbas intel_rapl_perf lpc_ich ipmi_devintf pcspkr sg ipmi_msghandler shpchp acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd grace dm_multipath sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm nvme crc32c_intel nvme_core ahci i2c_core libahci libata tg3 megaraid_sas ptp pps_core fjes dm_mirror dm_region_hash dm_log dm_mod
[  130.051563] CPU: 2 PID: 1287 Comm: kworker/2:1H Not tainted 4.10.0-rc8 #1
[  130.059139] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
[  130.067689] Workqueue: kblockd blk_mq_timeout_work
[  130.073033] task: ffff88027373ad00 task.stack: ffffc900028c0000
[  130.079639] RIP: 0010:blk_mq_end_request+0x58/0x70
[  130.084982] RSP: 0018:ffffc900028c3d50 EFLAGS: 00010202
[  130.090810] RAX: 0000000000000001 RBX: ffff8804712260c0 RCX: ffff880167377d88
[  130.098771] RDX: 0000000000001000 RSI: 0000000000001000 RDI: 0000000000000000
[  130.106732] RBP: ffffc900028c3d60 R08: 0000000000000006 R09: ffff880167377d00
[  130.114694] R10: 0000000000001000 R11: 0000000000000001 R12: 00000000fffffffb
[  130.122656] R13: ffff8804709be300 R14: 0000000000000002 R15: ffff880471bccb40
[  130.130619] FS:  0000000000000000(0000) GS:ffff880277c40000(0000) knlGS:0000000000000000
[  130.139647] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  130.146058] CR2: 00007f77f827ef78 CR3: 0000000384a19000 CR4: 00000000001406e0
[  130.154018] Call Trace:
[  130.156750]  blk_mq_check_expired+0x76/0x80
[  130.161417]  bt_iter+0x45/0x50
[  130.164823]  blk_mq_queue_tag_busy_iter+0xdd/0x1f0
[  130.170170]  ? blk_mq_rq_timed_out+0x70/0x70
[  130.174933]  ? blk_mq_rq_timed_out+0x70/0x70
[  130.179698]  ? __switch_to+0x140/0x450
[  130.183879]  blk_mq_timeout_work+0x88/0x170
[  130.188549]  process_one_work+0x165/0x410
[  130.193014]  worker_thread+0x137/0x4c0
[  130.197195]  kthread+0x101/0x140
[  130.200794]  ? rescuer_thread+0x3b0/0x3b0
[  130.205265]  ? kthread_park+0x90/0x90
[  130.209353]  ret_from_fork+0x2c/0x40
[  130.213340] Code: 48 85 c0 74 0d 44 89 e6 48 89 df ff d0 5b 41 5c 5d c3 48 8b bb 70 01 00 00 48 85 ff 75 0f 48 89 df e8 5d f0 ff ff 5b 41 5c 5d c3 <0f> 0b e8 51 f0 ff ff 90 eb e9 0f 1f 40 00 66 2e 0f 1f 84 00 00
[  130.234425] RIP: blk_mq_end_request+0x58/0x70 RSP: ffffc900028c3d50
[  130.241453] ---[ end trace 735162105b943c01 ]---


2. BUG: unable to handle kernel NULL pointer
   Full log: http://pastebin.com/FAPL4tH4

[  311.069298] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[  311.078048] IP: nvme_queue_rq+0x6e6/0x8cd [nvme]
[  311.083197] PGD 4720a9067
[  311.083198] PUD 470e88067
[  311.086213] PMD 0
[  311.089230]
[  311.093132] Oops: 0000 [#1] SMP
[  311.096634] Modules linked in: vfat fat intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore iTCO_wdt iTCO_vendor_support mxm_wmi intel_rapl_perf dcdbas pcspkr mei_me sg ipmi_si lpc_ich ipmi_devintf mei ipmi_msghandler shpchp acpi_power_meter wmi nfsd auth_rpcgss nfs_acl lockd dm_multipath grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci libata tg3 nvme crc32c_intel nvme_core megaraid_sas i2c_core ptp fjes pps_core dm_mirror dm_region_hash dm_log dm_mod
[  311.163881] CPU: 8 PID: 3569 Comm: fio Not tainted 4.10.0-rc8 #1
[  311.170582] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
[  311.179125] task: ffff88027134ad00 task.stack: ffffc90003ce8000
[  311.185730] RIP: 0010:nvme_queue_rq+0x6e6/0x8cd [nvme]
[  311.191462] RSP: 0018:ffffc90003ceb7e0 EFLAGS: 00010246
[  311.197291] RAX: 0000000000000000 RBX: ffff880472847680 RCX: 0000000000001000
[  311.205252] RDX: 0000000000000fff RSI: 00000003ca676000 RDI: ffff8804714d01d0
[  311.213212] RBP: ffffc90003ceb8c0 R08: 0000000000002000 R09: 0000000000001000
[  311.221173] R10: 0000000000001000 R11: 0000000000000000 R12: ffff88047298f800
[  311.229134] R13: 0000000000000040 R14: ffff880471864f80 R15: ffff8804714d0000
[  311.237096] FS:  00007f9b83453700(0000) GS:ffff880277d00000(0000) knlGS:0000000000000000
[  311.246125] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  311.252536] CR2: 0000000000000010 CR3: 000000047182b000 CR4: 00000000001406e0
[  311.260497] Call Trace:
[  311.263229]  ? __sbitmap_queue_get+0x2b/0xa0
[  311.267995]  ? remove_wait_queue+0x60/0x60
[  311.272564]  blk_mq_try_issue_directly+0x7e/0x100
[  311.277811]  blk_mq_make_request+0x39b/0x430
[  311.282574]  generic_make_request+0x103/0x1d0
[  311.287433]  submit_bio+0x75/0x150
[  311.291230]  submit_bh_wbc+0x141/0x180
[  311.295411]  __block_write_full_page+0x13d/0x3b0
[  311.300563]  ? I_BDEV+0x20/0x20
[  311.304065]  ? I_BDEV+0x20/0x20
[  311.307567]  block_write_full_page+0xe5/0x110
[  311.312427]  blkdev_writepage+0x18/0x20
[  311.316708]  __writepage+0x13/0x40
[  311.320504]  write_cache_pages+0x26f/0x510
[  311.325072]  ? compound_head+0x20/0x20
[  311.329254]  generic_writepages+0x51/0x80
[  311.333728]  blkdev_writepages+0x2f/0x40
[  311.338104]  do_writepages+0x1e/0x30
[  311.342090]  __filemap_fdatawrite_range+0xc6/0x100
[  311.347434]  filemap_write_and_wait+0x3d/0x80
[  311.352295]  __sync_blockdev+0x1f/0x40
[  311.356476]  __blkdev_put+0x74/0x290
[  311.360463]  blkdev_put+0x4c/0x120
[  311.364257]  blkdev_close+0x25/0x30
[  311.368149]  __fput+0xe7/0x210
[  311.371555]  ____fput+0xe/0x10
[  311.374963]  task_work_run+0x83/0xb0
[  311.378952]  exit_to_usermode_loop+0x66/0x92
[  311.383716]  do_syscall_64+0x165/0x180
[  311.387899]  entry_SYSCALL64_slow_path+0x25/0x25
[  311.393050] RIP: 0033:0x7f9b9e4a54fd
[  311.397035] RSP: 002b:00007f9b83452c40 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[  311.405482] RAX: 0000000000000000 RBX: 00007f9b9ccb1890 RCX: 00007f9b9e4a54fd
[  311.413443] RDX: 00007f9b9f3ee000 RSI: 0000000000000080 RDI: 0000000000000034
[  311.421404] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000df1
[  311.429365] R10: 6e493d726f727265 R11: 0000000000000293 R12: 0000000000000000
[  311.437326] R13: 00007f9b87e3fc00 R14: 0000000000000000 R15: 00007f9b87e4d0c0
[  311.445287] Code: 8b bd 70 ff ff ff 89 95 50 ff ff ff 89 8d 58 ff ff ff 44 89 95 60 ff ff ff e8 c7 f6 2c e1 8b 95 50 ff ff ff 48 89 85 68 ff ff ff <4c> 8b 48 10 44 8b 58 18 8b 8d 58 ff ff ff 44 8b 95 60 ff ff ff
[  311.466366] RIP: nvme_queue_rq+0x6e6/0x8cd [nvme] RSP: ffffc90003ceb7e0
[  311.473745] CR2: 0000000000000010
[  311.477466] ---[ end trace e734cece2a2f4159 ]---

3. WARNING: CPU: 8 PID: 3565 at lib/list_debug.c:28 __list_add_valid+0x91/0xa0
   Full log: http://pastebin.com/9Tn59pLh

[  391.631349] WARNING: CPU: 8 PID: 3565 at lib/list_debug.c:28 __list_add_valid+0x91/0xa0
[  391.640294] list_add corruption. prev->next should be next (ffffc90003bb7ab0), but was ffff880471c84800. (prev=ffff880471c84800).
[  391.653304] Modules linked in: vfat fat intel_rapl sb_edac edac_core ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me iTCO_wdt intel_cstate iTCO_vendor_support intel_uncore mei mxm_wmi dcdbas intel_rapl_perf lpc_ich sg pcspkr ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter shpchp wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci libata tg3 crc32c_intel nvme nvme_core megaraid_sas ptp i2c_core fjes pps_core dm_mirror dm_region_hash dm_log dm_mod
[  391.720572] CPU: 8 PID: 3565 Comm: main.sh Not tainted 4.10.0-rc8 #1
[  391.727660] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
[  391.736203] Call Trace:
[  391.738933]  dump_stack+0x63/0x87
[  391.742622]  __warn+0xd1/0xf0
[  391.745929]  warn_slowpath_fmt+0x5f/0x80
[  391.750306]  __list_add_valid+0x91/0xa0
[  391.754585]  blk_mq_make_request+0x36a/0x430
[  391.759348]  generic_make_request+0x103/0x1d0
[  391.764207]  submit_bio+0x75/0x150
[  391.768001]  submit_bh_wbc+0x141/0x180
[  391.772182]  __block_write_full_page+0x13d/0x3b0
[  391.777332]  ? I_BDEV+0x20/0x20
[  391.780834]  ? I_BDEV+0x20/0x20
[  391.784336]  block_write_full_page+0xe5/0x110
[  391.789197]  blkdev_writepage+0x18/0x20
[  391.793478]  __writepage+0x13/0x40
[  391.797271]  write_cache_pages+0x26f/0x510
[  391.801838]  ? compound_head+0x20/0x20
[  391.806021]  generic_writepages+0x51/0x80
[  391.810492]  blkdev_writepages+0x2f/0x40
[  391.814868]  do_writepages+0x1e/0x30
[  391.818854]  __filemap_fdatawrite_range+0xc6/0x100
[  391.824198]  filemap_write_and_wait+0x3d/0x80
[  391.829058]  __sync_blockdev+0x1f/0x40
[  391.833239]  fsync_bdev+0x44/0x50
[  391.836938]  invalidate_partition+0x24/0x50
[  391.841603]  del_gendisk+0xce/0x290
[  391.845496]  nvme_ns_remove+0x6a/0xd0 [nvme_core]
[  391.850744]  nvme_remove_namespaces+0x32/0x50 [nvme_core]
[  391.856768]  nvme_uninit_ctrl+0x2d/0xa0 [nvme_core]
[  391.862211]  nvme_remove+0x5d/0x130 [nvme]
[  391.866782]  pci_device_remove+0x39/0xc0
[  391.871161]  device_release_driver_internal+0x141/0x1f0
[  391.876989]  device_release_driver+0x12/0x20
[  391.881751]  pci_stop_bus_device+0x8c/0xa0
[  391.886319]  pci_stop_and_remove_bus_device_locked+0x1a/0x30
[  391.892634]  remove_store+0x7c/0x90
[  391.896524]  dev_attr_store+0x18/0x30
[  391.900610]  sysfs_kf_write+0x3a/0x50
[  391.904693]  kernfs_fop_write+0xfc/0x180
[  391.909071]  __vfs_write+0x37/0x160
[  391.912962]  ? selinux_file_permission+0xe5/0x120
[  391.918209]  ? security_file_permission+0x3b/0xc0
[  391.923457]  vfs_write+0xb2/0x1b0
[  391.927155]  ? syscall_trace_enter+0x1d0/0x2b0
[  391.932113]  SyS_write+0x55/0xc0
[  391.935712]  do_syscall_64+0x67/0x180
[  391.939799]  entry_SYSCALL64_slow_path+0x25/0x25
[  391.944948] RIP: 0033:0x7fc64e087c60
[  391.948933] RSP: 002b:00007ffdc3b8c6b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  391.957378] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fc64e087c60
[  391.965339] RDX: 0000000000000002 RSI: 00007fc64e9a4000 RDI: 0000000000000001
[  391.973299] RBP: 00007fc64e9a4000 R08: 000000000000000a R09: 00007fc64e99f740
[  391.981260] R10: 0000000000000001 R11: 0000000000000246 R12: 00007fc64e35a400
[  391.989220] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000
[  391.997191] ---[ end trace 14051aa2857ac86e ]---

4. WARNING: CPU: 9 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x103/0x130
   Full log: http://pastebin.com/sLmaVMcN

[  558.703723] WARNING: CPU: 9 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x103/0x130
[  558.714618] percpu ref (blk_queue_usage_counter_release) <= 0 (-1) after switching to atomic
[  558.714619] Modules linked in: vfat fat ipmi_ssif intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass iTCO_wdt crct10dif_pclmul crc32_pclmul iTCO_vendor_support ghash_clmulni_intel mxm_wmi mei_me dcdbas intel_cstate mei ipmi_si pcspkr intel_uncore intel_rapl_perf sg ipmi_devintf lpc_ich ipmi_msghandler wmi acpi_power_meter shpchp nfsd auth_rpcgss nfs_acl lockd dm_multipath grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel ahci libahci nvme libata nvme_core tg3 megaraid_sas i2c_core ptp fjes pps_core dm_mirror dm_region_hash dm_log dm_mod
[  558.791297] CPU: 9 PID: 0 Comm: swapper/9 Not tainted 4.10.0-rc8 #1
[  558.798291] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.2.5 09/06/2016
[  558.806835] Call Trace:
[  558.809563]  <IRQ>
[  558.811814]  dump_stack+0x63/0x87
[  558.815514]  __warn+0xd1/0xf0
[  558.818825]  warn_slowpath_fmt+0x5f/0x80
[  558.823208]  ? blk_put_queue+0x20/0x20
[  558.827395]  percpu_ref_switch_to_atomic_rcu+0x103/0x130
[  558.833332]  rcu_process_callbacks+0x20b/0x5d0
[  558.838297]  __do_softirq+0xd1/0x2a2
[  558.842293]  irq_exit+0xe9/0x100
[  558.845896]  smp_apic_timer_interrupt+0x3d/0x50
[  558.850954]  apic_timer_interrupt+0x93/0xa0
[  558.855618] RIP: 0010:cpuidle_enter_state+0xe4/0x260
[  558.861159] RSP: 0018:ffffc9000014fe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[  558.869600] RAX: ffff88047fd18f80 RBX: ffff88047fd21200 RCX: 000000000000001f
[  558.877564] RDX: 0000000000000000 RSI: ffff88047fd16658 RDI: 0000000000000000
[  558.885528] RBP: ffffc9000014fea0 R08: 0000000000000001 R09: cccccccccccccccd
[  558.893491] R10: 0000000000002686 R11: 0000000000000008 R12: 0000000000000004
[  558.901456] R13: 0000000000000009 R14: ffffffff81ce2a20 R15: 00000082146f083e
[  558.909422]  </IRQ>
[  558.911764]  ? cpuidle_enter_state+0xc0/0x260
[  558.916626]  cpuidle_enter+0x17/0x20
[  558.920617]  call_cpuidle+0x23/0x40
[  558.924510]  do_idle+0x172/0x200
[  558.928110]  cpu_startup_entry+0x71/0x80
[  558.932491]  start_secondary+0x154/0x190
[  558.936869]  start_cpu+0x14/0x14
[  558.940477] ---[ end trace 52811962d031d1f1 ]---

Best Regards,
  Yi Zhang





More information about the Linux-nvme mailing list