Hang at NVME Host caused by Controller reset
Krishnamraju Eraparaju
krishna2 at chelsio.com
Tue Jul 28 13:42:27 EDT 2020
Sagi,
Yes, Multipath is disabled.
This time, with "nvme-fabrics: allow to queue requests for live queues"
patch applied, I see hang only at blk_queue_enter():
[Jul28 17:25] INFO: task nvme:21119 blocked for more than 122 seconds.
[ +0.000061] Not tainted 5.8.0-rc7ekr+ #2
[ +0.000052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ +0.000059] nvme D14392 21119 2456 0x00004000
[ +0.000059] Call Trace:
[ +0.000110] __schedule+0x32b/0x670
[ +0.000108] schedule+0x45/0xb0
[ +0.000107] blk_queue_enter+0x1e9/0x250
[ +0.000109] ? wait_woken+0x70/0x70
[ +0.000110] blk_mq_alloc_request+0x53/0xc0
[ +0.000111] nvme_alloc_request+0x61/0x70 [nvme_core]
[ +0.000121] nvme_submit_user_cmd+0x50/0x310 [nvme_core]
[ +0.000118] nvme_user_cmd+0x12e/0x1c0 [nvme_core]
[ +0.000163] ? _copy_to_user+0x22/0x30
[ +0.000113] blkdev_ioctl+0x100/0x250
[ +0.000115] block_ioctl+0x34/0x40
[ +0.000110] ksys_ioctl+0x82/0xc0
[ +0.000109] __x64_sys_ioctl+0x11/0x20
[ +0.000109] do_syscall_64+0x3e/0x70
[ +0.000120] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ +0.000112] RIP: 0033:0x7fbe9cdbb67b
[ +0.000110] Code: Bad RIP value.
[ +0.000124] RSP: 002b:00007ffd61ff5778 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ +0.000170] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
00007fbe9cdbb67b
[ +0.000114] RDX: 00007ffd61ff5780 RSI: 00000000c0484e43 RDI:
0000000000000003
[ +0.000113] RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000000
[ +0.000115] R10: 0000000000000000 R11: 0000000000000246 R12:
00007ffd61ff7219
[ +0.000123] R13: 0000000000000006 R14: 00007ffd61ff5e30 R15:
000055e09c1854a0
[ +0.000115] Kernel panic - not syncing: hung_task: blocked tasks
You could easily reproduce this by running below, parallelly, for 10min:
while [ 1 ]; do nvme write-zeroes /dev/nvme0n1 -s 1 -c 1; done
while [ 1 ]; do echo 1 > /sys/block/nvme0n1/device/reset_controller;
done
while [ 1 ]; do ifconfig enp2s0f4 down; sleep 24; ifconfig enp2s0f4 up;
sleep 28; done
Not sure using nvme-write this way is valid or not..
Thanks,
Krishna.
On Tuesday, July 07/28/20, 2020 at 08:54:18 -0700, Sagi Grimberg wrote:
>
>
> On 7/28/20 4:59 AM, Krishnamraju Eraparaju wrote:
> >Sagi,
> >With the given patch, I am no more seeing the freeze_queue_wait hang
> >issue, but I am seeing another hang issue:
>
> The trace suggest that you are not running with multipath right?
>
> I think you need the patch:
> [PATCH] nvme-fabrics: allow to queue requests for live queues
>
> You can find it in linux-nvme
More information about the Linux-nvme
mailing list