[PATCH V2] nvme-pci: fix race condition between reset and nvme_dev_disable()
Keith Busch
kbusch at kernel.org
Tue Oct 15 10:27:24 PDT 2024
On Tue, Oct 15, 2024 at 01:21:00PM +0200, Maurizio Lombardi wrote:
> -static void nvme_pci_update_nr_queues(struct nvme_dev *dev)
> +static bool nvme_pci_update_nr_queues(struct nvme_dev *dev)
> {
> if (!dev->ctrl.tagset) {
> nvme_alloc_io_tag_set(&dev->ctrl, &dev->tagset, &nvme_mq_ops,
> nvme_pci_nr_maps(dev), sizeof(struct nvme_iod));
> - return;
> + return true;
> + }
> +
> + /* Give up if we are racing with nvme_dev_disable() */
> + if (!mutex_trylock(&dev->shutdown_lock))
> + return false;
> +
> + /* Check if nvme_dev_disable() has been executed already */
> + if (!dev->online_queues) {
> + mutex_unlock(&dev->shutdown_lock);
> + return false;
> }
>
> blk_mq_update_nr_hw_queues(&dev->tagset, dev->online_queues - 1);
> /* free previously allocated queues that are no longer usable */
> nvme_free_queues(dev, dev->online_queues);
> + mutex_unlock(&dev->shutdown_lock);
I believe mutex_unlock needs to be above blk_mq_update_nr_hw_queues().
That function needs all the queues to be frozen, so any older IO that
times out is going to need this lock in order to reclaim it.
More information about the Linux-nvme
mailing list