lockdep WARNING at blktests block/011
Tetsuo Handa
penguin-kernel at I-love.SAKURA.ne.jp
Fri Sep 30 04:06:10 PDT 2022
On 2022/09/30 9:19, Shinichiro Kawasaki wrote:
> Kernel v6.0-rc7 triggered the WARN "possible circular locking dependency
> detected" at blktests test case block/011 for NVME devices [1]. The trigger
> commit was c0feea594e05 ("workqueue: don't skip lockdep work dependency in
> cancel_work_sync()"). The commit was backported to v5.15.71 stable kernel and
> the WARN is observed with the stable kernel also. It looks that the possible
> deadlock scenario has been hidden for long time and the commit revealed it now.
Right.
> I tried to understand the deadlock scenario, but can not tell if it is for real,
> or false-positive. Help to understand the scenario will be appreciated. The
> lockdep splat mentions three locks (ctrl->namespaces_rwsem, dev->shutdown_lock
> and q->timeout_work).
Right.
This is a circular locking problem which involves three locks.
(A) (work_completion)(&q->timeout_work) --> &dev->shutdown_lock
(B) &dev->shutdown_lock --> &ctrl->namespaces_rwsem
(C) &ctrl->namespaces_rwsem --> (work_completion)(&q->timeout_work)
(A) and (B) happens due to
INIT_WORK(&q->timeout_work, blk_mq_timeout_work);
blk_mq_timeout_work(work) { // Is blocking __flush_work() from cancel_work_sync().
blk_mq_queue_tag_busy_iter(blk_mq_check_expired) {
bt_for_each(blk_mq_check_expired) == blk_mq_check_expired() {
blk_mq_rq_timed_out() {
req->q->mq_ops->timeout(req) == nvme_timeout(req) {
nvme_dev_disable() {
mutex_lock(&dev->shutdown_lock); // Holds dev->shutdown_lock
nvme_start_freeze(&dev->ctrl) {
down_read(&ctrl->namespaces_rwsem); // Holds ctrl->namespaces_rwsem which might block
up_read(&ctrl->namespaces_rwsem);
}
mutex_unlock(&dev->shutdown_lock);
}
}
}
}
}
}
(C) happens due to
nvme_sync_queues(ctrl) {
nvme_sync_io_queues(ctrl) {
down_read(&ctrl->namespaces_rwsem); // Holds ctrl->namespaces_rwsem which might block
/*** Question comes here. Please check the last block in this mail. ***/
blk_sync_queue(ns->queue) {
cancel_work_sync(&q->timeout_work) {
__flush_work((&q->timeout_work, true) {
// Might wait for completion of blk_mq_timeout_work() with ctrl->namespaces_rwsem held.
}
}
}
up_read(&ctrl->namespaces_rwsem);
}
}
> In the related function call paths, it looks that
> ctrl->namespaces_rwsem is locked only for read, so the deadlock scenario does
> not look possible for me.
Right.
> (Maybe I'm misunderstanding something...)
Although ctrl->namespaces_rwsem is a rw-sem and is held for read in these paths,
there are paths which hold ctrl->namespaces_rwsem for write. If someone is trying
to hold that rw-sem for write, these down_read(&ctrl->namespaces_rwsem) will be blocked.
That is, it also depends on whether
down_read(&ctrl->namespaces_rwsem);
and
down_write(&ctrl->namespaces_rwsem);
might run in parallel.
Question:
Does someone else call down_write(&ctrl->namespaces_rwsem) immediately after
someone's down_read(&ctrl->namespaces_rwsem) succeeded?
nvme_alloc_ns()/nvme_ns_remove()/nvme_remove_invalid_namespaces()/nvme_remove_namespaces() calls
down_write(&ctrl->namespaces_rwsem), and some of these are called from work callback functions
which might run in parallel with other work callback functions?
More information about the Linux-nvme
mailing list