[PATCH 3/3] nvme: freeze both the ns and multipath queues whe updating queue limits

Martin Wilck mwilck at suse.com
Mon Sep 2 06:08:45 PDT 2024


On Mon, 2024-09-02 at 12:36 +0200, Hannes Reinecke wrote:
> On 8/29/24 16:34, Hannes Reinecke wrote:
> > On 8/29/24 08:31, Christoph Hellwig wrote:
> > > We really should not have any I/O outstanding while updating the
> > > queue
> > > limits, as that could introduce inconsistencies between the
> > > multipath
> > > and underlying nodes.  So always freeze the multipath node first
> > > and then
> > > the underlying one, matching the order in nvme_passthru_start.
> > > 
> > > Signed-off-by: Christoph Hellwig <hch at lst.de>
> > > ---
> > >   drivers/nvme/host/core.c | 41 ++++++++++++++++++++++++++++-----
> > > -------
> > >   1 file changed, 29 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > index 81d2a09dd00f30..0f01a0bf976fd3 100644
> > > --- a/drivers/nvme/host/core.c
> > > +++ b/drivers/nvme/host/core.c
> [  .. ]
> > > @@ -2254,7 +2272,9 @@ static int nvme_update_ns_info_block(struct
> > > nvme_ns *ns,
> > >   done:
> > >       set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
> > >       set_bit(NVME_NS_READY, &ns->flags);
> > > -    blk_mq_unfreeze_queue(ns->disk->queue);
> > > +    if (nvme_ns_head_multipath(ns->head))
> > > +        ret = nvme_update_ns_info_mpath(ns, info, unsupported);
> > > +    nvme_ns_unfreeze(ns);
> > >       if (blk_queue_is_zoned(ns->queue)) {
> > >           ret = blk_revalidate_disk_zones(ns->disk);
> > 
> > While we're shuffling things around; doesn't the mean that we
> > unfreeze 
> > the queue (and allow I/O) before we've validated all zones?
> > Wouldn't it be better if we validate zoned before unfreezing
> > queues?
> > 
> > Otherwise it looks good; I'll give it a spin on my map/unmap
> > testcases.
> > 
> Hmm. Turns out we're not quite there yet.
> We call 'set_capacity_and_notify()' unconditionally a few lines up, 
> before setting the namespace to 'READY'. And then we're doing another
> call to 'set_capacity_and_notify()' (from
> nvme_update_ns_info_mpath()).
> In my map/unmap tests I've seen a stall with udev, where the initial
> scanning process it stuck in device_add_disk() trying to read the 
> partition table (and waiting for the uptodate bit to clear), and the 
> udev process closing the bdev and waiting on
> 'truncate_filemap_range').

I don't understand how the udev stall came to pass. The path device
(ns->disk) is hidden, thus the set_capacity_and_notify() call for it
doesn't trigger an uevent.

> 
> With this patch:
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index fdd67a55a477..8858e5bf49de 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2258,8 +2258,6 @@ static int nvme_update_ns_info_block(struct 
> nvme_ns *ns,
>                  goto out;
>          }
> 
> -       set_capacity_and_notify(ns->disk, capacity);
> -
>          /*
>           * Only set the DEAC bit if the device guarantees that reads
> from
>           * deallocated data return zeroes.  While the DEAC bit does
> not
> @@ -2274,6 +2272,8 @@ static int nvme_update_ns_info_block(struct 
> nvme_ns *ns,
>          set_bit(NVME_NS_READY, &ns->flags);
>          if (nvme_ns_head_multipath(ns->head))
>                  ret = nvme_update_ns_info_mpath(ns, info,
> unsupported);
> +       else
> +               set_capacity_and_notify(ns->disk, capacity);
>          nvme_ns_unfreeze(ns);
> 

If the capacity actually changes, this would mean that we don't call
set_capacity() any more for ns->disk. Which means that
bdev_nr_sectors(ns->disk->part0) will not be updated, and that the
later call to set_capacity_and_notify() for the multipath device will
use the outdated value, as it uses the path device's capacity to update
the multipath device's:

	set_capacity_and_notify(disk, get_capacity(ns->disk));

Regards,
Martin




More information about the Linux-nvme mailing list