[PATCH 1/2] nvme: make NVMe freeze API reliably

Tue Sep 6 19:06:09 PDT 2022

On Wed, Sep 07, 2022 at 09:18:52AM +0800, Chao Leng wrote:
> 
> 
> On 2022/9/7 8:33, Ming Lei wrote:
> > On Tue, Sep 06, 2022 at 05:32:01PM +0800, Chao Leng wrote:
> > > 
> > > 
> > > On 2022/9/6 16:45, Ming Lei wrote:
> > > > On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
> > > > > 
> > > > > 
> > > > > On 2022/8/21 16:47, Ming Lei wrote:
> > > > > > From: Keith Busch <kbusch at kernel.org>
> > > > > > 
> > > > > > In some corner cases[1], freeze wait and unfreeze API may be called on
> > > > > > unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
> > > > > > freeze APIs more reliably, then this kind of issues can be avoided.
> > > > > > And similar approach has been applied on stopping/quiescing nvme queues.
> > > > > This leads to another problem: the process that needs to be
> > > > > in the frozen state is not actually frozen.
> > > > > It's not safe.
> > > > 
> > > > The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
> > > > can be done only the flag is set. Not sure how it isn't safe.
> > > I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
> > > If just set_bit in nvme_start_freeze, it will cause another problem in
> > > below scenario.
> > > A: start freeze and set the bit;B:start freeze and set the bit;
> > > and then
> > > A:test and clear the bit, and unfreeze;B: test and skip;
> > > The queue will be frozen for ever.
> > 
> > One simple approach is to replace down_read(->namespaces_rwsem) with
> > down_write(->namespaces_rwsem) in nvme_start_freeze() and
> > nvme_unfreeze().
> > 
> > > 
> > > In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
> > > No matter how to use NVME_NS_FREEZE , it may cause problems.
> > > The freeze mechanism is perfect, and no additional protection mechanism is required.
> > 
> > block layer requires queue freeze and unfreeze APIs to be called in
> > pair strictly, that is why I add the 1st patch.
> From your bug analysis, the reason is that nvme_wait_freeze is called without nvme_start_freeze.
> patch 2/2 is already delete the nvme_wait_freeze.
> If there is another bug of unmatched freeze and unfreeze,
> can you describe the analysis of unmatched freeze and unfreeze?
> The current patch 1/2 will introduce the unmatched freeze and unfreeze rather than solved it.
> Maybe another patch is needed to fix the bug.

Please look at nvme_reset_work():
		...
		if (dev->online_queues < 2) {
			...
        } else {
                nvme_start_queues(&dev->ctrl);
				nvme_wait_freeze(&dev->ctrl);
                if (!dev->ctrl.tagset)
                        nvme_pci_alloc_tag_set(dev);
                else
                        nvme_pci_update_nr_queues(dev);
                nvme_dbbuf_set(dev);
                nvme_unfreeze(&dev->ctrl);
		}
		...

nvme_wait_freeze() is called on unfrozen queue, so io hang is caused.

If nvme_unfreeze() is called on unfrozen queue, WARN_ON_ONCE() in
__blk_mq_unfreeze_queue() will be triggered.

Thanks,
Ming