lockdep WARNING at blktests block/011

Wed Oct 5 03:00:30 PDT 2022

On 2022/10/05 17:31, Shinichiro Kawasaki wrote:
> @@ -5120,11 +5120,27 @@ EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
>  void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
>  {
>  	struct nvme_ns *ns;
> +	LIST_HEAD(splice);
>  
> -	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> +	/*
> +	 * blk_sync_queues() call in ctrl->snamespaces_rwsem critical section
> +	 * triggers deadlock warning by lockdep since cancel_work_sync() in
> +	 * blk_sync_queue() waits for nvme_timeout() work completion which may
> +	 * lock the ctrl->snamespaces_rwsem. To avoid the deadlock possibility,
> +	 * call blk_sync_queues() out of the critical section by moving the
> +         * ctrl->namespaces list elements to the stack list head temporally.
> +	 */
> +
> +	down_write(&ctrl->namespaces_rwsem);
> +	list_splice_init(&ctrl->namespaces, &splice);
> +	up_write(&ctrl->namespaces_rwsem);

Does this work?

ctrl->namespaces being empty when calling blk_sync_queue() means that
e.g. nvme_start_freeze() cannot find namespaces to freeze, doesn't it?

  blk_mq_timeout_work(work) { // Is blocking __flush_work() from cancel_work_sync().
    blk_mq_queue_tag_busy_iter(blk_mq_check_expired) {
      bt_for_each(blk_mq_check_expired) == blk_mq_check_expired() {
        blk_mq_rq_timed_out() {
          req->q->mq_ops->timeout(req) == nvme_timeout(req) {
            nvme_dev_disable() {
              mutex_lock(&dev->shutdown_lock); // Holds dev->shutdown_lock
              nvme_start_freeze(&dev->ctrl) {
                down_read(&ctrl->namespaces_rwsem); // Holds ctrl->namespaces_rwsem which might block
                //blk_freeze_queue_start(ns->queue); // <= Never be called because ctrl->namespaces is empty.
                up_read(&ctrl->namespaces_rwsem);
              }
              mutex_unlock(&dev->shutdown_lock);
            }
          }
        }
      }
    }
  }

Are you sure that down_read(&ctrl->namespaces_rwsem) users won't run
when ctrl->namespaces is temporarily made empty? (And if you are sure
that down_read(&ctrl->namespaces_rwsem) users won't run when
ctrl->namespaces is temporarily made empty, why ctrl->namespaces_rwsem
needs to be a rw-sem rather than a plain mutex or spinlock ?)

> +
> +	list_for_each_entry(ns, &splice, list)
>  		blk_sync_queue(ns->queue);
> -	up_read(&ctrl->namespaces_rwsem);
> +
> +	down_write(&ctrl->namespaces_rwsem);
> +	list_splice(&splice, &ctrl->namespaces);
> +	up_write(&ctrl->namespaces_rwsem);
>  }
>  EXPORT_SYMBOL_GPL(nvme_sync_io_queues);

I don't know about dependency chain, but you might be able to add
"struct nvme_ctrl"->sync_io_queue_mutex which is held for serializing
nvme_sync_io_queues() and down_write(&ctrl->namespaces_rwsem) users?

If we can guarantee that ctrl->namespaces_rwsem => ctrl->sync_io_queue_mutex
is impossible, nvme_sync_io_queues() can use ctrl->sync_io_queue_mutex
rather than ctrl->namespaces_rwsem, and down_write(&ctrl->namespaces_rwsem)/
up_write(&ctrl->namespaces_rwsem) users are replaced with
  mutex_lock(&ctrl->sync_io_queue_mutex);
  down_write(&ctrl->namespaces_rwsem);
and
  up_write(&ctrl->namespaces_rwsem);
  mutex_unlock(&ctrl->sync_io_queue_mutex);
sequences respectively.