[PATCH 3/4] nvmet: fix hang in nvmet_ns_disable()
Taehee Yoo
ap420073 at gmail.com
Wed Jan 4 00:56:41 PST 2023
Hi Chaitanya,
Thank you so much for your review!
On 1/4/23 09:32, Chaitanya Kulkarni wrote:
> On 1/3/23 02:03, Taehee Yoo wrote:
>> nvme target namespace is enabled or disabled by nvmet_ns_enable() or
>> nvmet_ns_disable().
>> The subsys->lock is used to disallow to use namespace data while
>> nvmet_ns_enable() or nvmet_ns_disable() are working.
>> The ns->enabled boolean variable prevents using namespace data in wrong
>> state such as uninitialized state.
>>
>> nvmet_ns_disable() acquires ns->lock and set ns->enabled false.
>> Then, it releases ns->lock for a while to wait ns->disable_done
completion.
>> At this point, nvmet_ns_enable() can be worked concurrently and it calls
>> percpu_ref_init().
>> So, ns->disable_done will never be completed.
>> Therefore hang would occur at this point.
>>
>> CPU0 CPU1
>> nvmet_ns_disable();
>> mutex_lock(&subsys->lock); nvmet_ns_enable();
>> mutex_lock(&subsys->lock);
>> ns->enabled = false;
>> mutex_unlock(&subsys->lock);
>> percpu_ref_init();
>> wait_for_completion(&ns->disable_done); <-- infinite wait
>>
>> mutex_lock(&subsys->lock);
>> mutex_unlock(&subsys->lock);
>>
>> INFO: task bash:926 blocked for more than 30 seconds.
>> Tainted: G W 6.1.0+ #17
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
>> task:bash state:D stack:27200 pid:926 ppid:911
>> flags:0x00004000
>> Call Trace:
>> <TASK>
>> __schedule+0xafc/0x2930
>> ? io_schedule_timeout+0x160/0x160
>> ? _raw_spin_unlock_irq+0x24/0x50
>> ? __wait_for_common+0x39b/0x5c0
>> ? usleep_range_state+0x190/0x190
>> schedule+0x130/0x230
>> schedule_timeout+0x18a/0x240
>> ? usleep_range_state+0x190/0x190
>> ? rcu_read_lock_sched_held+0x12/0x80
>> ? lock_downgrade+0x700/0x700
>> ? do_raw_spin_trylock+0xb5/0x180
>> ? lock_contended+0xdf0/0xdf0
>> ? _raw_spin_unlock_irq+0x24/0x50
>> ? trace_hardirqs_on+0x3c/0x190
>> __wait_for_common+0x1ca/0x5c0
>> ? usleep_range_state+0x190/0x190
>> ? bit_wait_io+0xf0/0xf0
>> ? _raw_spin_unlock_irqrestore+0x59/0x70
>> nvmet_ns_disable+0x288/0x490
>> ? nvmet_ns_enable+0x970/0x970
>> ? lockdep_hardirqs_on_prepare+0x410/0x410
>> ? rcu_read_lock_sched_held+0x12/0x80
>> ? configfs_write_iter+0x1df/0x480
>> ? nvmet_ns_revalidate_size_store+0x220/0x220
>> nvmet_ns_enable_store+0x85/0xe0
>> [ ... ]
>>
>> Fixes: a07b4970f464 ("nvmet: add a generic NVMe target")
>> Signed-off-by: Taehee Yoo <ap420073 at gmail.com>
>
> [...]
>
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index 89bedfcd974c..e609787577c6 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -56,6 +56,12 @@
>> #define IPO_IATTR_CONNECT_SQE(x) \
>> (cpu_to_le32(offsetof(struct nvmf_connect_command, x)))
>>
>> +enum nvmet_ns_state {
>> + NVMET_NS_ENABLED,
>> + NVMET_NS_DISABLING,
>> + NVMET_NS_DISABLED
>> +};
>> +
>
> The patch looks good to me, but I'm wondering if there is a way
> we can do this without adding new enable disable states ?
>
I'm not sure, but the workqueue would be possible.
The point is to make enable() and disable() are worked exclusively
(serially).
workqueue will execute enable()/disable() functions serially.
Thanks a lot,
Taehee Yoo
More information about the Linux-nvme
mailing list