[PATCH] nvme: Remove namespace when nvme_identify_ns_descs() failed

Nilay Shroff nilay at linux.ibm.com
Wed Jan 15 00:18:54 PST 2025



On 1/15/25 1:32 PM, Hannes Reinecke wrote:
> On 1/15/25 08:48, Nilay Shroff wrote:
>>
>>
>> On 1/13/25 7:59 PM, Hannes Reinecke wrote:
>>> On 1/13/25 15:12, Nilay Shroff wrote:
>>>>
>>>>
>>>> On 1/13/25 1:13 PM, Hannes Reinecke wrote:
>>>>> On 1/11/25 15:01, Nilay Shroff wrote:
>>>>>>
>>>>>>
>>> [ .. ]
>>>>> So my argument is that in this specific case the 'ANA inaccessible' nvme
>>>>> state should _not_ be retried, but should be treated as identical to
>>>>> 'invalid namespace' errors.
>>>>>
>>>> I think I got what you're trying to propose. So when this issue manifests, on host, if we
>>>> could possibly differentiate between nvme_identify_ns_descs() failed reasons : is it failed
>>>> because the nsid has been removed/un-mapped on the target or is it failed due to "ANA inaccessible"
>>>> state? IMO, for "ANA inaccessible" status, we may not want to immediately remove the ns from
>>>> the host (due to reason I mentioned earlier per NVMe spec section 8.1.3.3), however for the
>>>> other error case we may remove the ns from the host.
>>>> I think issuing ns descriptor list command to target for a nsid which doesn't exist on the
>>>> target would return buffer filled with all zeros. So that might be an indication that ns has
>>>> been removed from the target.
>>>>    
>>> But only if the NSID has not been remapped in the meantime.
>>> If it has (as in my case) the ns descriptor list will be valid, it just
>>> refers to another namespace.
>>>
>> If NSID has been unmapped and then remapped on the targer then in that case,
>> host would hit the mismatch uuid case (under nvme_validate_ns()) and so host
>> would then remove the namespace.
>>
>> I think there are two cases,
>> Case1:
>> 1. AEN triggers rescan
>> 2. List of active nsid is retrieved
>> -> NSID A is removed on the target
>> 3. Scanning of NSID A fails (i.e. nvme_identify_ns_descs() returns buffer filled with all zeros)
>> -> host removes the respective namespace
>>
>> Case2:
>> 1. AEN triggers rescan
>> 2. List of active nsid is retrieved
>> -> NSID A is unmapped and remapped (possibly with different uuid) on target
>> 3. Scanning of NSID A succeed
>> 4. host finds the mismatch uuid for NSID A (i.e. nvme_validate_ns() fails)
>> -> host removes the respective namespace
>>   
> Entirely correct.
> But Case2 results in the new namespace never to be scanned, and not visible to the OS. Which is the error I'm fighting with.
> 
Ok but then it seems that your proposed patch doesn't address Case2, isn't it? 
It appears to me that the patch tries to address Case1 but with error code 
of "ANA inaccessible and DNR" set. IMO for case2, we may want to schedule 
queue scan again if nvme_validate_ns() fails due to the mismatch uuid.

Thanks,
--Nilay




More information about the Linux-nvme mailing list