[PATCH 1/2] nvme: make NVMe freeze API reliably

Tue Sep 6 22:58:14 PDT 2022


On 2022/9/7 10:06, Ming Lei wrote:
> On Wed, Sep 07, 2022 at 09:18:52AM +0800, Chao Leng wrote:
>>
>>
>> On 2022/9/7 8:33, Ming Lei wrote:
>>> On Tue, Sep 06, 2022 at 05:32:01PM +0800, Chao Leng wrote:
>>>>
>>>>
>>>> On 2022/9/6 16:45, Ming Lei wrote:
>>>>> On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
>>>>>>
>>>>>>
>>>>>> On 2022/8/21 16:47, Ming Lei wrote:
>>>>>>> From: Keith Busch <kbusch at kernel.org>
>>>>>>>
>>>>>>> In some corner cases[1], freeze wait and unfreeze API may be called on
>>>>>>> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
>>>>>>> freeze APIs more reliably, then this kind of issues can be avoided.
>>>>>>> And similar approach has been applied on stopping/quiescing nvme queues.
>>>>>> This leads to another problem: the process that needs to be
>>>>>> in the frozen state is not actually frozen.
>>>>>> It's not safe.
>>>>>
>>>>> The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
>>>>> can be done only the flag is set. Not sure how it isn't safe.
>>>> I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
>>>> If just set_bit in nvme_start_freeze, it will cause another problem in
>>>> below scenario.
>>>> A: start freeze and set the bit;B:start freeze and set the bit;
>>>> and then
>>>> A:test and clear the bit, and unfreeze;B: test and skip;
>>>> The queue will be frozen for ever.
>>>
>>> One simple approach is to replace down_read(->namespaces_rwsem) with
>>> down_write(->namespaces_rwsem) in nvme_start_freeze() and
>>> nvme_unfreeze().
>>>
>>>>
>>>> In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
>>>> No matter how to use NVME_NS_FREEZE , it may cause problems.
>>>> The freeze mechanism is perfect, and no additional protection mechanism is required.
>>>
>>> block layer requires queue freeze and unfreeze APIs to be called in
>>> pair strictly, that is why I add the 1st patch.
>>  From your bug analysis, the reason is that nvme_wait_freeze is called without nvme_start_freeze.
>> patch 2/2 is already delete the nvme_wait_freeze.
>> If there is another bug of unmatched freeze and unfreeze,
>> can you describe the analysis of unmatched freeze and unfreeze?
>> The current patch 1/2 will introduce the unmatched freeze and unfreeze rather than solved it.
>> Maybe another patch is needed to fix the bug.
> 
> Please look at nvme_reset_work():
> 		...
> 		if (dev->online_queues < 2) {
> 			...
>          } else {
>                  nvme_start_queues(&dev->ctrl);
> 				nvme_wait_freeze(&dev->ctrl);
>                  if (!dev->ctrl.tagset)
>                          nvme_pci_alloc_tag_set(dev);
>                  else
>                          nvme_pci_update_nr_queues(dev);
>                  nvme_dbbuf_set(dev);
>                  nvme_unfreeze(&dev->ctrl);
> 		}
> 		...
> 
> nvme_wait_freeze() is called on unfrozen queue, so io hang is caused.
> 
> If nvme_unfreeze() is called on unfrozen queue, WARN_ON_ONCE() in
> __blk_mq_unfreeze_queue() will be triggered.
The bug is specific to the PCI transport layer.
It might be a better option to fix this bug by adding flag for normal connection
and abnormal recovery like other transport layers, and then perform different processing.
Different processes are processed differently to avoid unexpected nvme_unfreeze.
> 
> 
> 
> Thanks,
> Ming
> 
> .
>