[PATCH] nvme: Remove namespace when nvme_identify_ns_descs() failed

Hannes Reinecke hare at suse.de
Tue Dec 3 23:14:08 PST 2024


On 12/3/24 20:15, Keith Busch wrote:
> On Fri, Nov 29, 2024 at 03:06:08PM +0100, Hannes Reinecke wrote:
>> When a namespace gets unmapped on the target during scanning
>> nvme_identify_ns_descs() returns with a non-retryable error.
>> With the currrent code we will ignore that error on the grounds
>> that we failed to get information, and hence cannot make any
>> decisions whether to keep or remove that namespace.
>> But a non-retryable error implies that the namespace is _not_
>> present as we cannot retry that command and will never get
>> information about that namespace.
>> And we need to remove the namespace during scanning, as otherwise
>> the AEN informing us about a namespace change will find the NSID
>> present, but nvme_validate_ns() will fail, and the namespace
>> will never be updated with the correct information.
> 
> The scanning only checks namespaces returned in the "active" namespace
> list. Every namespace not in the active list gets removed already. Why
> is this unmapped namespace appearing on the active list?

Timing. Imagine a system used as a backing store for kubernetes, where 
namespaces come and go at a _really_ fast pace.
1) AEN triggers a rescan
2) List of active namespace is retrieved
-> NSID A gets unmapped (or moved to another node in the cluster)
3) Scan of NSID A returns an error with DNR set.
Without this patch we keep the namespace around, so eventually we'll
trip over the 'non-matching UUID' check once the NSID is reused.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



More information about the Linux-nvme mailing list