lockdep WARNING at blktests block/011
Tetsuo Handa
penguin-kernel at I-love.SAKURA.ne.jp
Wed Oct 5 03:00:30 PDT 2022
On 2022/10/05 17:31, Shinichiro Kawasaki wrote:
> @@ -5120,11 +5120,27 @@ EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
> void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
> {
> struct nvme_ns *ns;
> + LIST_HEAD(splice);
>
> - down_read(&ctrl->namespaces_rwsem);
> - list_for_each_entry(ns, &ctrl->namespaces, list)
> + /*
> + * blk_sync_queues() call in ctrl->snamespaces_rwsem critical section
> + * triggers deadlock warning by lockdep since cancel_work_sync() in
> + * blk_sync_queue() waits for nvme_timeout() work completion which may
> + * lock the ctrl->snamespaces_rwsem. To avoid the deadlock possibility,
> + * call blk_sync_queues() out of the critical section by moving the
> + * ctrl->namespaces list elements to the stack list head temporally.
> + */
> +
> + down_write(&ctrl->namespaces_rwsem);
> + list_splice_init(&ctrl->namespaces, &splice);
> + up_write(&ctrl->namespaces_rwsem);
Does this work?
ctrl->namespaces being empty when calling blk_sync_queue() means that
e.g. nvme_start_freeze() cannot find namespaces to freeze, doesn't it?
blk_mq_timeout_work(work) { // Is blocking __flush_work() from cancel_work_sync().
blk_mq_queue_tag_busy_iter(blk_mq_check_expired) {
bt_for_each(blk_mq_check_expired) == blk_mq_check_expired() {
blk_mq_rq_timed_out() {
req->q->mq_ops->timeout(req) == nvme_timeout(req) {
nvme_dev_disable() {
mutex_lock(&dev->shutdown_lock); // Holds dev->shutdown_lock
nvme_start_freeze(&dev->ctrl) {
down_read(&ctrl->namespaces_rwsem); // Holds ctrl->namespaces_rwsem which might block
//blk_freeze_queue_start(ns->queue); // <= Never be called because ctrl->namespaces is empty.
up_read(&ctrl->namespaces_rwsem);
}
mutex_unlock(&dev->shutdown_lock);
}
}
}
}
}
}
Are you sure that down_read(&ctrl->namespaces_rwsem) users won't run
when ctrl->namespaces is temporarily made empty? (And if you are sure
that down_read(&ctrl->namespaces_rwsem) users won't run when
ctrl->namespaces is temporarily made empty, why ctrl->namespaces_rwsem
needs to be a rw-sem rather than a plain mutex or spinlock ?)
> +
> + list_for_each_entry(ns, &splice, list)
> blk_sync_queue(ns->queue);
> - up_read(&ctrl->namespaces_rwsem);
> +
> + down_write(&ctrl->namespaces_rwsem);
> + list_splice(&splice, &ctrl->namespaces);
> + up_write(&ctrl->namespaces_rwsem);
> }
> EXPORT_SYMBOL_GPL(nvme_sync_io_queues);
I don't know about dependency chain, but you might be able to add
"struct nvme_ctrl"->sync_io_queue_mutex which is held for serializing
nvme_sync_io_queues() and down_write(&ctrl->namespaces_rwsem) users?
If we can guarantee that ctrl->namespaces_rwsem => ctrl->sync_io_queue_mutex
is impossible, nvme_sync_io_queues() can use ctrl->sync_io_queue_mutex
rather than ctrl->namespaces_rwsem, and down_write(&ctrl->namespaces_rwsem)/
up_write(&ctrl->namespaces_rwsem) users are replaced with
mutex_lock(&ctrl->sync_io_queue_mutex);
down_write(&ctrl->namespaces_rwsem);
and
up_write(&ctrl->namespaces_rwsem);
mutex_unlock(&ctrl->sync_io_queue_mutex);
sequences respectively.
More information about the Linux-nvme
mailing list