Another wierd deadlock with nvme-tcp
Hannes Reinecke
hare at suse.de
Fri Oct 29 05:04:48 PDT 2021
Hi Sagi,
and I've run into another weird deadlock; this time it's nvme-tcp not
flushing timed out commands when deleting the controller:
[ 1685.982355] nvme nvme0: Removing ctrl: NQN
"nqn.2014-08.org.nvmexpress:uuid:62f37f51-0cc7-46d5-9865-4de22e81bd9d"
[ 1688.533746] nvme nvme0: queue 2: timeout request 0x72 type 4
[ 1688.533781] nvme nvme0: failed to send request -104
[ 1719.254055] nvme nvme0: queue 2: timeout request 0x73 type 4
[ 1719.254091] nvme nvme0: failed to send request -32
[ 1749.974336] nvme nvme0: queue 2: timeout request 0x74 type 4
[ 1749.974367] nvme nvme0: failed to send request -32
[ .. ]
[ 2087.897536] nvme nvme0: queue 2: timeout request 0x7f type 4
[ 2087.897568] nvme nvme0: failed to send request -32
[ 2118.617840] nvme nvme0: queue 2: timeout request 0x71 type 4
[ 2118.617878] nvme nvme0: failed to send request -32
[ 2149.338133] nvme nvme0: queue 2: timeout request 0x72 type 4
[ 2149.338168] nvme nvme0: failed to send request -32
[ 2180.058428] nvme nvme0: queue 2: timeout request 0x73 type 4
[ 2180.058468] nvme nvme0: failed to send request -32
[ 2210.779140] nvme nvme0: queue 2: timeout request 0x74 type 4
[ 2210.779171] nvme nvme0: failed to send request -32
[ 2241.499022] nvme nvme0: queue 2: timeout request 0x75 type 4
[ 2241.499061] nvme nvme0: failed to send request -32
[ 2272.219324] nvme nvme0: queue 2: timeout request 0x76 type 4
[ 2272.219370] nvme nvme0: failed to send request -32
[ 2302.939635] nvme nvme0: queue 2: timeout request 0x77 type 4
[ 2302.939674] nvme nvme0: failed to send request -32
[ 2333.659936] nvme nvme0: queue 2: timeout request 0x78 type 4
[ 2333.659982] nvme nvme0: failed to send request -32
[ 2364.380226] nvme nvme0: queue 2: timeout request 0x79 type 4
[ 2364.380269] nvme nvme0: failed to send request -32
[ 2395.100476] nvme nvme0: queue 2: timeout request 0x7a type 4
[ 2395.100521] nvme nvme0: failed to send request -32
[ 2425.820806] nvme nvme0: queue 2: timeout request 0x7b type 4
[ 2425.820854] nvme nvme0: failed to send request -32
[ 2446.301004] INFO: task kworker/u4:4:304 blocked for more than 491
seconds.
[ 2446.301012] Tainted: G W E 5.15.0-rc6-54-default+ #867
[ 2446.301015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 2446.301016] task:kworker/u4:4 state:D stack: 0 pid: 304 ppid:
2 flags:0x00004000
[ 2446.301025] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[ 2446.301049] Call Trace:
[ 2446.301053] __schedule+0x302/0x13a0
[ 2446.301091] ? block_read_full_page+0x21e/0x3b0
[ 2446.301098] ? blkdev_write_begin+0x20/0x20
[ 2446.301104] schedule+0x3a/0xa0
[ 2446.301108] io_schedule+0x3b/0x60
[ 2446.301111] do_read_cache_page+0x49b/0x790
[ 2446.301119] ? filemap_invalidate_unlock_two+0x40/0x40
[ 2446.301123] ? sun_partition+0x370/0x370
[ 2446.301126] read_part_sector+0x32/0xe0
[ 2446.301132] ultrix_partition+0x2f/0x1c0
[ 2446.301136] bdev_disk_changed+0x1e4/0x530
[ 2446.301141] blkdev_get_whole+0x81/0x90
[ 2446.301145] blkdev_get_by_dev+0x128/0x2d0
[ 2446.301150] device_add_disk+0x370/0x3b0
[ 2446.301155] nvme_mpath_set_live+0x4e/0x190 [nvme_core]
[ 2446.301168] nvme_mpath_add_disk+0x150/0x160 [nvme_core]
[ 2446.301180] ? device_add_disk+0x280/0x3b0
[ 2446.301184] nvme_alloc_ns+0x417/0x940 [nvme_core]
[ 2446.301197] nvme_validate_or_alloc_ns+0xe9/0x1e0 [nvme_core]
[ 2446.301207] ? __nvme_submit_sync_cmd+0x195/0x200 [nvme_core]
[ 2446.301219] nvme_scan_work+0x168/0x310 [nvme_core]
[ 2446.301231] process_one_work+0x231/0x420
[ 2446.301238] worker_thread+0x2d/0x3f0
[ 2446.301242] ? process_one_work+0x420/0x420
[ 2446.301245] kthread+0x11a/0x140
[ 2446.301249] ? set_kthread_struct+0x40/0x40
[ 2446.301252] ret_from_fork+0x22/0x30
[ 2446.301285] INFO: task nvme:3254 blocked for more than 491 seconds.
[ 2446.301288] Tainted: G W E 5.15.0-rc6-54-default+ #867
[ 2446.301290] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 2446.301291] task:nvme state:D stack: 0 pid: 3254 ppid:
2010 flags:0x00000000
[ 2446.301296] Call Trace:
[ 2446.301299] __schedule+0x302/0x13a0
[ 2446.301303] ? enqueue_entity+0xf2/0x580
[ 2446.301308] schedule+0x3a/0xa0
[ 2446.301311] schedule_timeout+0x115/0x280
[ 2446.301317] ? check_preempt_curr+0x29/0x60
[ 2446.301321] ? ttwu_do_wakeup+0x17/0x170
[ 2446.301324] wait_for_completion+0x99/0x100
[ 2446.301329] __flush_work+0x159/0x1e0
[ 2446.301333] ? worker_detach_from_pool+0xb0/0xb0
[ 2446.301338] nvme_remove_namespaces+0x3d/0xf0 [nvme_core]
[ 2446.301350] nvme_do_delete_ctrl+0x4b/0x80 [nvme_core]
[ 2446.301360] nvme_sysfs_delete+0x42/0x60 [nvme_core]
[ 2446.301370] kernfs_fop_write_iter+0x12f/0x1a0
[ 2446.301376] new_sync_write+0x122/0x1b0
[ 2446.301382] vfs_write+0x1eb/0x250
[ 2446.301387] ksys_write+0xa1/0xe0
[ 2446.301391] do_syscall_64+0x3a/0x80
[ 2446.301396] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2446.301400] RIP: 0033:0x7f8a61ad7c03
[ 2446.301403] RSP: 002b:00007ffec55ad268 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 2446.301407] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
00007f8a61ad7c03
[ 2446.301409] RDX: 0000000000000001 RSI: 000055cccc861b97 RDI:
0000000000000003
[ 2446.301411] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[ 2446.301413] R10: 00007f8a619ee468 R11: 0000000000000246 R12:
000055cccd1b8200
[ 2446.301415] R13: 0000000000000003 R14: 0000000000000000 R15:
000055cccca8d420
[ 2456.541092] nvme nvme0: queue 2: timeout request 0x7c type 4
[ 2456.541135] nvme nvme0: failed to send request -32
[ 2487.261382] nvme nvme0: queue 2: timeout request 0x7d type 4
[ 2487.261423] nvme nvme0: failed to send request -32
[ 2517.981690] nvme nvme0: queue 2: timeout request 0x7e type 4
[ 2517.981730] nvme nvme0: failed to send request -32
[ 2548.701970] nvme nvme0: queue 2: timeout request 0x7f type 4
[ 2548.702011] nvme nvme0: failed to send request -32
[ 2579.422284] nvme nvme0: queue 2: timeout request 0x71 type 4
[ 2579.422354] nvme nvme0: failed to send request -32
[ 2610.142592] nvme nvme0: queue 2: timeout request 0x72 type 4
[ 2610.142633] nvme nvme0: failed to send request -32
[ 2640.862920] nvme nvme0: queue 2: timeout request 0x73 type 4
[ 2640.862961] nvme nvme0: failed to send request -32
[ 2671.583244] nvme nvme0: queue 2: timeout request 0x74 type 4
[ 2671.583285] nvme nvme0: failed to send request -32
[ 2702.303535] nvme nvme0: queue 2: timeout request 0x75 type 4
[ 2702.303578] nvme nvme0: failed to send request -32
Odd, though; looks as if we don't failover to multipath, but rather try
to resend the command directly.
Inquiries are continuing ...
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
More information about the Linux-nvme
mailing list