[PATCH 1/2] nvme: make NVMe freeze API reliably

Chao Leng lengchao at huawei.com
Tue Sep 6 18:18:52 PDT 2022



On 2022/9/7 8:33, Ming Lei wrote:
> On Tue, Sep 06, 2022 at 05:32:01PM +0800, Chao Leng wrote:
>>
>>
>> On 2022/9/6 16:45, Ming Lei wrote:
>>> On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
>>>>
>>>>
>>>> On 2022/8/21 16:47, Ming Lei wrote:
>>>>> From: Keith Busch <kbusch at kernel.org>
>>>>>
>>>>> In some corner cases[1], freeze wait and unfreeze API may be called on
>>>>> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
>>>>> freeze APIs more reliably, then this kind of issues can be avoided.
>>>>> And similar approach has been applied on stopping/quiescing nvme queues.
>>>> This leads to another problem: the process that needs to be
>>>> in the frozen state is not actually frozen.
>>>> It's not safe.
>>>
>>> The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
>>> can be done only the flag is set. Not sure how it isn't safe.
>> I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
>> If just set_bit in nvme_start_freeze, it will cause another problem in
>> below scenario.
>> A: start freeze and set the bit;B:start freeze and set the bit;
>> and then
>> A:test and clear the bit, and unfreeze;B: test and skip;
>> The queue will be frozen for ever.
> 
> One simple approach is to replace down_read(->namespaces_rwsem) with
> down_write(->namespaces_rwsem) in nvme_start_freeze() and
> nvme_unfreeze().
> 
>>
>> In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
>> No matter how to use NVME_NS_FREEZE , it may cause problems.
>> The freeze mechanism is perfect, and no additional protection mechanism is required.
> 
> block layer requires queue freeze and unfreeze APIs to be called in
> pair strictly, that is why I add the 1st patch.
 From your bug analysis, the reason is that nvme_wait_freeze is called without nvme_start_freeze.
patch 2/2 is already delete the nvme_wait_freeze.
If there is another bug of unmatched freeze and unfreeze,
can you describe the analysis of unmatched freeze and unfreeze?
The current patch 1/2 will introduce the unmatched freeze and unfreeze rather than solved it.
Maybe another patch is needed to fix the bug.
> 
> 
> 
> Thanks,
> Ming
> 
> .
> 



More information about the Linux-nvme mailing list