[PATCH] nvme: Remove namespace when nvme_identify_ns_descs() failed

Sagi Grimberg sagi at grimberg.me
Fri Jan 10 15:16:15 PST 2025




On 08/01/2025 17:45, Hannes Reinecke wrote:
> On 1/8/25 11:49, Sagi Grimberg wrote:
>>
>>
>>
>> On 07/01/2025 10:11, Hannes Reinecke wrote:
>>> On 12/25/24 10:58, Sagi Grimberg wrote:
>>>>
>>>>
>>>>
>>>> On 29/11/2024 16:06, Hannes Reinecke wrote:
>>>>> When a namespace gets unmapped on the target during scanning
>>>>> nvme_identify_ns_descs() returns with a non-retryable error.
>>>>> With the currrent code we will ignore that error on the grounds
>>>>> that we failed to get information, and hence cannot make any
>>>>> decisions whether to keep or remove that namespace.
>>>>> But a non-retryable error implies that the namespace is _not_
>>>>> present as we cannot retry that command and will never get
>>>>> information about that namespace.
>>>>> And we need to remove the namespace during scanning, as otherwise
>>>>> the AEN informing us about a namespace change will find the NSID
>>>>> present, but nvme_validate_ns() will fail, and the namespace
>>>>> will never be updated with the correct information.
>>>>
>>>> Isn't that a bit harsh?
>>>> I would expect to see a specific status line NVME_SC_INVALID_NS or 
>>>> equivalent for a full removal of the namespace?
>>>
>>> Does it matter? If we get a DNR status back from 
>>> nvme_identify_ns_descs() we _cannot_ resend that command.
>>> Meaning we cannot get the namespace descriptors. As we
>>> rely on these descriptors to properly map the namespace
>>> we cannot correctly work with it, and we're better off
>>> to pretend the namespace is gone and wait for an AEN
>>> indicating that the namespace (or controller) state has changed.
>>
>> I think it does matter. I don't think we should be removing the NS 
>> without
>> the controller telling us that it is actually removed.
>
> But what would be the recovery action here?
> If the 'identify ns descs' command cannot be retried, how are
> we going to map the namespace to an ns_head?

Let's take a step back here. Can you describe the scenario you hit? what 
was the error
status that you observed?



More information about the Linux-nvme mailing list