[PATCHv3] nvme: correctly account for namespace head reference counter
Hannes Reinecke
hare at suse.de
Thu Jun 26 00:34:37 PDT 2025
On 6/26/25 07:19, Nilay Shroff wrote:
> The blktests nvme/058 manifests an issue where the NVMe subsystem
> kobject entry remains stale in sysfs, causing a failure during
> subsequent NVMe module reloads[1]. Specifically, when attempting to
> register a new NVMe subsystem, the driver encounters a kobejct name
> collision because a stale kobject still exists. Though, please note
> that nvme/058 doesn't report any failure and test case passes and
> it's only during subsequent NVMe module reloads, the stale nvme sub-
> system kobject entry in sysfs causes the observed symptom[1].
>
> This issue stems from an imbalance in the get/put usage of the namespace
> head (nshead) reference counter. The nshead holds a reference to the
> associated NVMe subsystem. If the nshead reference is not properly
> released, it prevents the cleanup of the subsystem's kobject, leaving
> nvme subsystem stale entry behind in sysfs.
>
> During the failure case, the last namespace path referencing a nshead
> is removed, but the nshead reference was not released. This occurs
> because the release logic currently only puts the nshead reference
> when its state is LIVE. However, in configurations where ANA (Asymmetric
> Namespace Access) is enabled, a namespace may be associated with an ANA
> state that is neither optimized nor non-optimized. In this case, the
> nshead may never transition to LIVE, and the corresponding nshead
> reference is then never dropped. In fact nvme/058 associates some of
> nvme namespaces to an inaccessible ANA state and with that nshead is
> created but it's state is not transitioned to LIVE. So the current
> logic would then causes nshead reference to be leaked for non-LIVE
> states.
>
> Another scenario, during namespace allocation, the driver first
> allocates a nshead and then issues an Identify Namespace command. If
> this command fails — which can happen in tests like nvme/058 that
> rapidly enables and disables namespaces — we must release the reference
> to the newly allocated nshead. However this reference release is
> currently missing in the failure, causing a nshead reference leak.
>
> To fix this, we now unconditionally release the nshead reference when
> the last nvme path referencing to the nshead is removed, regardless of
> the head’s state. Also during identify namespace failure case we now
> properly release the nshead reference. So this ensures proper cleanup
> of the nshead, and consequently, the NVMe subsystem and its associated
> kobject.
>
> This change prevents stale kobject entries from lingering in sysfs and
> eliminates the module reload failures observed just after running
> nvme/058.
>
> [1] https://lore.kernel.org/all/CAHj4cs8fOBS-eSjsd5LUBzy7faKXJtgLkCN+mDy_-ezCLLLq+Q@mail.gmail.com/
>
> Reported-by: yi.zhang at redhat.com
> Closes: https://lore.kernel.org/all/CAHj4cs8fOBS-eSjsd5LUBzy7faKXJtgLkCN+mDy_-ezCLLLq+Q@mail.gmail.com/
> Fixes: 62188639ec16 ("nvme-multipath: introduce delayed removal of the multipath head node")
> Tested-by: yi.zhang at redhat.com
> Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
> ---
> Changes from v2:
> - Fix typos in the commit message (hch)
> changes from v1:
> - Avoid double free of nshead when multipath is not configured.
> Link to V1: https://lore.kernel.org/all/c2e2aa93-9213-4322-a95d-27447f8b08de@linux.ibm.com/t/#u
> ---
> drivers/nvme/host/core.c | 16 +++++++++++++++-
> drivers/nvme/host/multipath.c | 5 ++++-
> 2 files changed, 19 insertions(+), 2 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare at suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list