Resets during user commands leads to hung task and controller stuck in connecting

Jonathan Derrick jonathan.derrick at linux.dev
Fri Nov 11 13:50:33 PST 2022


Hi,

I'm (again) seeing a hung task when doing resets and formats simultaneously.
Controller state is left in 'connecting'

Using nvme.git/nvme-6.2 as of 'nvme: implement the DEAC bit for the Write Zeroes command',
but I have also repro'd with Christoph's latest reset/probe-split set


ctrl="nvme0"
nsid=1
pci="/sys/block/${ctrl}n${nsid}/device/"
echo 30 > /proc/sys/kernel/hung_task_timeout_secs
while true; do
        nvme format -f /dev/${ctrl}n${nsid} &
        echo 1 > $pci/reset_controller &
done


[   79.195862] nvme nvme0: Ignoring bogus Namespace Identifiers
[  122.378580] INFO: task sh:7737 blocked for more than 30 seconds.
[  122.380329]       Not tainted 6.1.0-rc2+ #87
[  122.381594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  122.383782] task:sh              state:D stack:0     pid:7737  ppid:1      flags:0x00000004
[  122.386078] Call Trace:
[  122.386909]  <TASK>
[  122.387659]  __schedule+0x320/0xb10
[  122.388772]  ? lock_release+0x22b/0x450
[  122.389920]  ? lock_acquired+0x1a2/0x400
[  122.391094]  ? wait_for_completion+0x83/0x160
[  122.392358]  schedule+0x53/0xd0
[  122.393337]  schedule_timeout+0x310/0x3b0
[  122.394517]  ? rcu_read_lock_held_common+0xe/0x50
[  122.395847]  ? rcu_read_lock_sched_held+0x23/0x80
[  122.397189]  ? lock_release+0x22b/0x450
[  122.398336]  ? lock_acquired+0x1a2/0x400
[  122.399484]  ? wait_for_completion+0x83/0x160
[  122.400763]  wait_for_completion+0xb5/0x160
[  122.401968]  __flush_work+0x293/0x4a0
[  122.403068]  ? flush_workqueue_prep_pwqs+0x120/0x120
[  122.404463]  ? rcu_read_lock_sched_held+0x23/0x80
[  122.405791]  ? trace_hardirqs_on+0x2b/0xd0
[  122.406977]  nvme_reset_ctrl_sync+0x2a/0x40 [nvme_core]
[  122.408473]  nvme_sysfs_reset+0x12/0x30 [nvme_core]
[  122.409856]  kernfs_fop_write_iter+0x142/0x1e0
[  122.411137]  vfs_write+0x357/0x4f0
[  122.412161]  ksys_write+0x5f/0xe0
[  122.413172]  do_syscall_64+0x3a/0x90
[  122.414238]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  122.415646] RIP: 0033:0x7f7980ced648
[  122.416710] RSP: 002b:00007ffc01ee6778 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  122.418794] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f7980ced648
[  122.420673] RDX: 0000000000000002 RSI: 0000563b40d93ca0 RDI: 0000000000000001
[  122.422550] RBP: 0000563b40d93ca0 R08: 000000000000000a R09: 00007f7980d3cda0
[  122.424385] R10: 000000000000000a R11: 0000000000000246 R12: 00007f7980fc06e0
[  122.426232] R13: 0000000000000002 R14: 00007f7980fbb880 R15: 0000000000000002
[  122.428067]  </TASK>
[  122.428821] INFO: lockdep is turned off.





More information about the Linux-nvme mailing list