[PATCH] nvme: Remove namespace when nvme_identify_ns_descs() failed

Keith Busch kbusch at kernel.org
Tue Jan 7 08:01:55 PST 2025


On Fri, Dec 06, 2024 at 01:41:08PM +0100, Hannes Reinecke wrote:
> On 12/5/24 17:15, Keith Busch wrote:
> > On Thu, Dec 05, 2024 at 01:30:39PM +0100, Hannes Reinecke wrote:
> > > On 12/4/24 17:39, Keith Busch wrote:
> > > > > 1) AEN triggers a rescan
> > > > > 2) List of active namespace is retrieved
> > > > > -> NSID A gets unmapped (or moved to another node in the cluster)
> > > > > 3) Scan of NSID A returns an error with DNR set.
> > > > > Without this patch we keep the namespace around, so eventually we'll
> > > > > trip over the 'non-matching UUID' check once the NSID is reused.
> > > > 
> > > > I'm still not sure that makes sense. The target shouldn't attach the new
> > > > namespace until the host acknowledges the removal of the older NSID via
> > > > the Namespace Change List log. Until the log is read, the inventory for
> > > > removed namespaces should be latched. Otherwise, timing might remove+add
> > > > a specific NSID before the host requests the NS Descriptor for the
> > > > racing removal, then it would just get the "non-matching UUID" issue
> > > > anyway.
> > > 
> > > But we read the Namespace Change List log in step 2)
> > > (Not that we're doing anything with it, but that's another story...)
> > > Hmm?
> > 
> > Indeed. So maybe we should just move the log page retrevial *after* we
> > scan the identify active namespace list processing?
> 
> Not sure how that would help. We are getting an 'ANA inaccessible' with DNR
> set status when retrieving the NS descriptor list for the namespace.
> And this has to happen after we read the list of active namespace.
> Perfectly legit, but doesn't tell us anything if the namespace is present at
> all.
> All we know is that we cannot get information about that, and my argument is
> that we should treat this as equivalent to a namespace
> not present.
> 
> And I really don't want to delay clearing of the AEN, as that would
> open the door for us to miss subsequent AENs, getting even more out-of-sync
> with the target.

I just thought it would be cleaner if the driver could observe the
removed namespace is not present in the active namespace list
identification, so that all removals can happen in a single path.

What I'm worried about with your proposal is that it indicates we can
get a rapid remove + add sequence such that timing may create a
condition where instead of getting "ANA inaccessible w/ DNR", we'd
observe a mismached UUID.



More information about the Linux-nvme mailing list