lockdep WARNING at blktests block/011
Shinichiro Kawasaki
shinichiro.kawasaki at wdc.com
Thu Oct 6 18:36:49 PDT 2022
On Oct 06, 2022 / 02:30, Shinichiro Kawasaki wrote:
> On Oct 05, 2022 / 08:20, Keith Busch wrote:
> > On Wed, Oct 05, 2022 at 07:00:30PM +0900, Tetsuo Handa wrote:
> > > On 2022/10/05 17:31, Shinichiro Kawasaki wrote:
> > > > @@ -5120,11 +5120,27 @@ EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
> > > > void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
> > > > {
> > > > struct nvme_ns *ns;
> > > > + LIST_HEAD(splice);
> > > >
> > > > - down_read(&ctrl->namespaces_rwsem);
> > > > - list_for_each_entry(ns, &ctrl->namespaces, list)
> > > > + /*
> > > > + * blk_sync_queues() call in ctrl->snamespaces_rwsem critical section
> > > > + * triggers deadlock warning by lockdep since cancel_work_sync() in
> > > > + * blk_sync_queue() waits for nvme_timeout() work completion which may
> > > > + * lock the ctrl->snamespaces_rwsem. To avoid the deadlock possibility,
> > > > + * call blk_sync_queues() out of the critical section by moving the
> > > > + * ctrl->namespaces list elements to the stack list head temporally.
> > > > + */
> > > > +
> > > > + down_write(&ctrl->namespaces_rwsem);
> > > > + list_splice_init(&ctrl->namespaces, &splice);
> > > > + up_write(&ctrl->namespaces_rwsem);
> > >
> > > Does this work?
> > >
> > > ctrl->namespaces being empty when calling blk_sync_queue() means that
> > > e.g. nvme_start_freeze() cannot find namespaces to freeze, doesn't it?
> >
> > There can't be anything to timeout at this point. The controller is disabled
> > prior to syncing the queues. Not only is there no IO for timeout work to
> > operate on, the controller state is already disabled, so a subsequent freeze
> > would be skipped.
>
> Thank you. So, this temporary list move approach should be ok.
Keith, while I was preparing the formal patch, I noticed a path which may call
nvme_sync_io_queues() when NVME controller is not disabled. Quote from
drivers/nvme/host/pci.c:
static int nvme_suspend(struct device *dev)
{
struct pci_dev *pdev = to_pci_dev(dev);
struct nvme_dev *ndev = pci_get_drvdata(pdev);
struct nvme_ctrl *ctrl = &ndev->ctrl;
/* ... */
nvme_start_freeze(ctrl);
nvme_wait_freeze(ctrl);
nvme_sync_queues(ctrl);
if (ctrl->state != NVME_CTRL_LIVE)
goto unfreeze;
When nvme_sync_queues(ctrl) is called, still ctrl->state can be NMVE_CTRL_LIVE.
So, I think namespace addition or removal can happen in parallel to this
nvme_supsend() context (this is super rare though...). If this is true, the
patch to move namespace list to stack list head may cause removed (or added)
namespace to appear (or disappear) after suspend & resume. (I think other paths
of nvme_sync_io_queues() disables the controller and fine.)
Comment on this guess will be appreciated. If this guess is correct, Tetsuo's
suggestion would be the better, even though it adds a new mutex.
--
Shin'ichiro Kawasaki
More information about the Linux-nvme
mailing list