[PATCH V2] nvme-pci: fix race condition between reset and nvme_dev_disable()
Keith Busch
kbusch at kernel.org
Tue Oct 15 10:57:32 PDT 2024
On Tue, Oct 15, 2024 at 11:27:24AM -0600, Keith Busch wrote:
> On Tue, Oct 15, 2024 at 01:21:00PM +0200, Maurizio Lombardi wrote:
> > -static void nvme_pci_update_nr_queues(struct nvme_dev *dev)
> > +static bool nvme_pci_update_nr_queues(struct nvme_dev *dev)
> > {
> > if (!dev->ctrl.tagset) {
> > nvme_alloc_io_tag_set(&dev->ctrl, &dev->tagset, &nvme_mq_ops,
> > nvme_pci_nr_maps(dev), sizeof(struct nvme_iod));
> > - return;
> > + return true;
> > + }
> > +
> > + /* Give up if we are racing with nvme_dev_disable() */
> > + if (!mutex_trylock(&dev->shutdown_lock))
> > + return false;
> > +
> > + /* Check if nvme_dev_disable() has been executed already */
> > + if (!dev->online_queues) {
> > + mutex_unlock(&dev->shutdown_lock);
> > + return false;
> > }
> >
> > blk_mq_update_nr_hw_queues(&dev->tagset, dev->online_queues - 1);
> > /* free previously allocated queues that are no longer usable */
> > nvme_free_queues(dev, dev->online_queues);
> > + mutex_unlock(&dev->shutdown_lock);
>
> I believe mutex_unlock needs to be above blk_mq_update_nr_hw_queues().
> That function needs all the queues to be frozen, so any older IO that
> times out is going to need this lock in order to reclaim it.
Oops, unlocking there doesn't solve your problem. I guess your patch
should be safe as-is since the nvme driver already waits for the freeze
prior to updating the hw queues.
More information about the Linux-nvme
mailing list