Deadlock on failure to read NVMe namespace

Sagi Grimberg sagi at grimberg.me
Tue Oct 19 08:06:17 PDT 2021



On 10/19/21 5:27 PM, Sagi Grimberg wrote:
> 
>>>> 481:~ # cat /proc/15761/stack
>>>> [<0>] nvme_mpath_clear_ctrl_paths+0x25/0x80 [nvme_core]
>>>> [<0>] nvme_remove_namespaces+0x31/0xf0 [nvme_core]
>>>> [<0>] nvme_do_delete_ctrl+0x4b/0x80 [nvme_core]
>>>> [<0>] nvme_sysfs_delete+0x42/0x60 [nvme_core]
>>>> [<0>] kernfs_fop_write_iter+0x12f/0x1a0
>>>> [<0>] new_sync_write+0x122/0x1b0
>>>> [<0>] vfs_write+0x1eb/0x250
>>>> [<0>] ksys_write+0xa1/0xe0
>>>> [<0>] do_syscall_64+0x3a/0x80
>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>> c481:~ # cat /proc/14965/stack
>>>> [<0>] do_read_cache_page+0x49b/0x790
>>>> [<0>] read_part_sector+0x39/0xe0
>>>> [<0>] read_lba+0xf9/0x1d0
>>>> [<0>] efi_partition+0xf1/0x7f0
>>>> [<0>] bdev_disk_changed+0x1ee/0x550
>>>> [<0>] blkdev_get_whole+0x81/0x90
>>>> [<0>] blkdev_get_by_dev+0x128/0x2e0
>>>> [<0>] device_add_disk+0x377/0x3c0
>>>> [<0>] nvme_mpath_set_live+0x130/0x1b0 [nvme_core]
>>>> [<0>] nvme_mpath_add_disk+0x150/0x160 [nvme_core]
>>>> [<0>] nvme_alloc_ns+0x417/0x950 [nvme_core]
>>>> [<0>] nvme_validate_or_alloc_ns+0xe9/0x1e0 [nvme_core]
>>>> [<0>] nvme_scan_work+0x168/0x310 [nvme_core]
>>>> [<0>] process_one_work+0x231/0x420
>>>> [<0>] worker_thread+0x2d/0x3f0
>>>> [<0>] kthread+0x11a/0x140
>>>> [<0>] ret_from_fork+0x22/0x30

...

> I think this sequence is familiar and was addressed by a fix from Anton
> (CC'd) which still has some pending review comments.
> 
> Can you lookup and try:
> [PATCH] nvme/mpath: fix hang when disk goes live over reconnect

Actually, I see the trace is going from nvme_alloc_ns, no the ANA
update path, so that is unlikely to address the issue.

Looking at nvme_mpath_clear_ctrl_paths, I don't think it should
take the scan_lock anymore. IIRC the reason why it needed the
scan_lock in the first place was the fact that ctrl->namespaces
was added and then sorted in scan_work (taking the namespaces_rwsem
twice).

But now that ctrl->namespaces is always sorted, and accessed with
namespaces_rwsem, I think that scan_lock is no longer needed
here and namespaces_rwsem is sufficient.

Thoughts?



More information about the Linux-nvme mailing list