[PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled

Sasha Levin sashal at kernel.org
Tue May 5 02:51:43 PDT 2026


From: John Garry <john.g.garry at oracle.com>

[ Upstream commit 3f150f0f010f234f34a67897344f18e68fe803f7 ]

The delayed disk removal work is canceled when a NS (re)appears. However,
we do not put the module reference grabbed in nvme_mpath_remove_disk(), so
fix that.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Nilay Shroff <nilay at linux.ibm.com>
Reviewed-by: Chaitanya Kulkarni <kch at nvidia.com>
Signed-off-by: John Garry <john.g.garry at oracle.com>
Signed-off-by: Keith Busch <kbusch at kernel.org>
Signed-off-by: Sasha Levin <sashal at kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Decision Summary
This should be backported to stable trees that contain NVMe multipath
delayed removal, especially `v6.16+` trees checked locally. It fixes a
real module reference leak: `nvme_mpath_remove_disk()` takes a
`THIS_MODULE` reference before scheduling delayed removal work, but
`nvme_init_ns_head()` can cancel that work when a namespace reappears.
If canceled, the work callback never runs and therefore never executes
its existing `module_put(THIS_MODULE)`. The patch balances that
reference only when `cancel_delayed_work()` actually canceled pending
work.

## Phase 1: Commit Message Forensics
Step 1.1 Record: subsystem `nvme-multipath`; action verb `put`/implicit
`fix`; intent is to release a module reference when delayed namespace
removal work is canceled.

Step 1.2 Record: tags present: `Reviewed-by: Christoph Hellwig
<hch at lst.de>`, `Reviewed-by: Nilay Shroff <nilay at linux.ibm.com>`,
`Reviewed-by: Chaitanya Kulkarni <kch at nvidia.com>`, `Signed-off-by: John
Garry <john.g.garry at oracle.com>`, `Signed-off-by: Keith Busch
<kbusch at kernel.org>`. No `Fixes:`, `Reported-by:`, `Tested-by:`,
`Link:`, or `Cc: stable` tag in the commit.

Step 1.3 Record: bug is a module reference leak. The commit body states
the delayed disk removal work is canceled when a namespace appears or
reappears, but the module reference acquired in
`nvme_mpath_remove_disk()` is not released. Symptom from related
blktests discussion: module refcount does not return to the original
value after reconnect. Root cause: cancellation bypasses
`nvme_remove_head_work()`, where the existing `module_put()` lives.

Step 1.4 Record: yes, this is a hidden resource/reference leak fix. The
subject does not say “fix leak”, but the body and code show a missing
`module_put()` on a cancellation path.

## Phase 2: Diff Analysis
Step 2.1 Record: one file changed, `drivers/nvme/host/core.c`, with 2
insertions and 1 deletion. Modified function: `nvme_init_ns_head()`.
Scope: single-file surgical fix.

Step 2.2 Record: before, `nvme_init_ns_head()` unconditionally called
`cancel_delayed_work(&head->remove_work)` after adding the namespace
head. After, it checks the return value and calls
`module_put(THIS_MODULE)` only if pending delayed work was canceled. The
affected path is namespace initialization/reappearance after delayed
multipath removal was scheduled.

Step 2.3 Record: bug category is reference counting/resource leak.
`nvme_mpath_remove_disk()` acquires a module reference with
`try_module_get(THIS_MODULE)` before scheduling `head->remove_work`;
`nvme_remove_head_work()` releases it with `module_put(THIS_MODULE)`
when the work runs. If `nvme_init_ns_head()` cancels the pending work,
the callback does not run, so the added `module_put()` balances the
acquired reference. Workqueue docs in `kernel/workqueue.c` verify
`cancel_delayed_work()` returns true if pending work was canceled and
false otherwise.

Step 2.4 Record: fix quality is high. It is minimal and correctly
conditional. Regression risk is very low: an extra `module_put()` is
only performed when the work was pending and canceled, matching the
exact reference lifetime. If the work was not pending or was already
running, no extra put is done.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows the cancellation line in
`nvme_init_ns_head()` was introduced by `dd2c18548964` (`nvme: reset
delayed remove_work after reconnect`), first contained around
`v6.16-rc4`. The delayed removal work and callback `module_put()` came
from `62188639ec16` (`nvme-multipath: introduce delayed removal of the
multipath head node`), first contained around `v6.16-rc1`.

Step 3.2 Record: no `Fixes:` tag is present, so there was no tagged
commit to follow. Manual history points to the interaction between
`62188639ec16` and `dd2c18548964`.

Step 3.3 Record: recent related commits include `62188639ec16`
introducing delayed removal, `dd2c18548964` adding cancellation on
reconnect, and `0f5197ea9a73` fixing a different delayed-removal module
reference failure path. This commit is standalone; it only changes the
cancellation path in `core.c`.

Step 3.4 Record: John Garry has recent NVMe-related commits and reviewed
the related `0f5197ea9a73` fix. The patch was committed by Keith Busch
and reviewed by Christoph Hellwig, both key NVMe maintainers.

Step 3.5 Record: no prerequisite commit beyond the delayed-removal
feature and reconnect cancellation code. The current `core.c` already
includes `<linux/module.h>`, so the new `module_put(THIS_MODULE)`
requires no include change.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 3f150f0f010f2` found the original patch at `
https://patch.msgid.link/20260415155358.1517871-1-
john.g.garry at oracle.com`. `b4 dig -a` found only v1, so the committed
version is the submitted revision. Lore mirror shows three positive
reviews and Keith Busch applying it to `nvme-7.1`.

Step 4.2 Record: `b4 dig -w` shows John Garry, Christoph Hellwig, Keith
Busch, Nilay Shroff, and `linux-nvme` were included. Appropriate
maintainers/reviewers were on the thread.

Step 4.3 Record: no `Reported-by` or bugzilla/syzbot link. Related
blktests thread reported a concrete failure: `module refcount not as
original`; John Garry replied that he had posted the kernel fix for it,
referring to this fix.

Step 4.4 Record: related patch thread for `nvme-multipath: fix leak on
try_module_get failure` contains John Garry identifying this missing
`module_put()` on timer cancellation; Keith Busch agreed the check and
`module_put()` were correct and that `_sync` was unnecessary.

Step 4.5 Record: direct lore stable search was blocked by Anubis; web
search did not find stable-specific objections or reasons not to
backport.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: modified function is `nvme_init_ns_head()`.

Step 5.2 Record: `nvme_init_ns_head()` is called by `nvme_alloc_ns()`.
`nvme_alloc_ns()` is called by `nvme_scan_ns()` when a namespace is
discovered during controller scan. Scan work is queued from namespace
change AEN handling, reset/start paths, sysfs rescan, and ioctl rescan
paths.

Step 5.3 Record: key callees around the bug are `cancel_delayed_work()`,
`module_put()`, `nvme_mpath_remove_disk()`, `try_module_get()`,
`mod_delayed_work()`, and `nvme_remove_head_work()`.

Step 5.4 Record: reachable path is: namespace/path removal ->
`nvme_ns_remove()` -> last path -> `nvme_mpath_remove_disk()` ->
`try_module_get()` + delayed work; later namespace reappears -> scan
path -> `nvme_scan_ns()` -> `nvme_alloc_ns()` -> `nvme_init_ns_head()`
-> cancel delayed work. This is reachable through real NVMe multipath
path loss/reconnect and through the blktests NVMe loop scenario. I did
not verify an unprivileged trigger; the tested path writes
`delayed_removal_secs` via sysfs and manipulates NVMe connectivity.

Step 5.5 Record: similar reference-balancing pattern exists elsewhere in
the kernel, e.g. Bluetooth code checks `cancel_delayed_work()` and drops
a reference only when cancellation succeeds. No additional identical
NVMe missing-put sites were found in the searched files.

## Phase 6: Cross-Referencing And Stable Tree Analysis
Step 6.1 Record: local stable branch checks show `stable/linux-6.16.y`,
`6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y` contain
`cancel_delayed_work(&head->remove_work)`,
`try_module_get(THIS_MODULE)`, `module_put(THIS_MODULE)`, and
`delayed_removal_secs`. `stable/linux-6.12.y` does not contain this
delayed-removal code, so this specific bug is not present there.

Step 6.2 Record: `git apply --check` for the patch succeeds on the
current `stable/linux-7.0.y` checkout. The exact hunk context exists in
local `6.16.y` through `7.0.y` branches, so backport difficulty should
be clean or trivial. I did not create separate worktrees for branch-
specific apply checks.

Step 6.3 Record: no alternative fix for this exact cancellation leak was
found in checked stable branches; the candidate commit itself is not an
ancestor of the checked stable branches.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: subsystem is NVMe host multipath, under
`drivers/nvme/host`. Criticality: important for systems using NVMe
multipath and delayed removal, not universal core-kernel impact.

Step 7.2 Record: subsystem is actively developed; recent history
includes delayed removal introduction, reconnect cancellation, and
related leak fixes. This bug is in relatively new code introduced for
`v6.16`.

## Phase 8: Impact And Risk Assessment
Step 8.1 Record: affected users are `CONFIG_NVME_MULTIPATH` users who
enable nonzero `delayed_removal_secs` and experience path loss followed
by namespace/path return before the delayed removal work fires.

Step 8.2 Record: trigger is realistic for NVMe multipath transient path
failures or hot-remove/re-add scenarios. Related blktests exercises this
with NVMe loop. Unprivileged trigger was not verified.

Step 8.3 Record: failure mode is module reference leak, severity medium.
It can prevent `nvme_core` from being unloadable and causes persistent
refcount imbalance; the test discussion shows a concrete refcount
mismatch.

Step 8.4 Record: benefit is moderate for affected NVMe multipath users
because it fixes a real reference leak in a recovery path. Risk is very
low: 2 added lines, no API change, no behavior change except balancing a
reference when pending work is canceled.

## Phase 9: Final Synthesis
Step 9.1 Record: evidence for backporting: real reference leak; concrete
test failure in related blktests discussion; tiny one-function fix;
reviewed by three NVMe developers; maintainer agreed the logic; applies
cleanly to current `7.0.y` and relevant code exists in `6.16+` stable
branches. Evidence against: affects a specific configured NVMe multipath
feature, not all systems; severity is not crash/data
corruption/security. Unresolved: I did not verify exact branch-specific
application in separate worktrees, and direct lore stable search was
blocked.

Step 9.2 Record: stable rules checklist:
1. Obviously correct and tested: yes, by code inspection and related
   blktests discussion; no `Tested-by` tag.
2. Fixes real bug: yes, module reference leak.
3. Important issue: yes enough for stable as a driver lifecycle
   reference leak, though severity is medium rather than critical.
4. Small and contained: yes, 2 insertions/1 deletion in one function.
5. No new feature/API: yes.
6. Can apply to stable: yes for current `7.0.y`; exact context exists in
   checked `6.16+` branches. Not applicable to trees lacking delayed
   removal.

Step 9.3 Record: no exception category such as device ID, quirk, DT,
build, or documentation. This is a normal bug fix.

Step 9.4 Record: backport recommended to stable trees containing NVMe
delayed removal cancellation, notably checked `6.16.y` through `7.0.y`;
not applicable to older trees where the feature is absent.

## Verification
- [Phase 1] Parsed commit `3f150f0f010f2` with `git show --format=fuller
  --stat --patch`; confirmed subject, body, tags, and 2-line fix.
- [Phase 2] Inspected `drivers/nvme/host/core.c` and
  `drivers/nvme/host/multipath.c`; confirmed `try_module_get()` in
  `nvme_mpath_remove_disk()` and `module_put()` in
  `nvme_remove_head_work()`.
- [Phase 2] Checked `kernel/workqueue.c`; confirmed
  `cancel_delayed_work()` returns true if pending work was canceled and
  false otherwise.
- [Phase 3] Ran `git blame` on the changed and related lines; identified
  `dd2c18548964`, `62188639ec16`, and `0f5197ea9a73`.
- [Phase 3] Ran `git show` on related commits; confirmed delayed removal
  introduction, reconnect cancellation, and prior try-module failure
  fix.
- [Phase 4] Ran `b4 dig -c`, `-a`, and `-w`; found original patch URL,
  single v1 revision, and appropriate NVMe recipients.
- [Phase 4] Fetched lore mirror thread; confirmed reviews by Christoph
  Hellwig, Nilay Shroff, Chaitanya Kulkarni, and application by Keith
  Busch.
- [Phase 4] Fetched related try-module failure and blktests threads;
  confirmed this missing cancellation `module_put()` was discussed and
  tied to module refcount mismatch.
- [Phase 5] Used `rg` and file reads to trace callers from
  `nvme_init_ns_head()` through `nvme_alloc_ns()`, `nvme_scan_ns()`, and
  `nvme_scan_work()`.
- [Phase 6] Used `git merge-base --is-ancestor` and branch object
  inspection to confirm the relevant code exists in local `6.16.y`,
  `6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y`, but not `6.12.y`.
- [Phase 6] Ran `git apply --check` for the candidate diff on current
  `stable/linux-7.0.y`; it succeeded.
- UNVERIFIED: exact `git apply --check` on separate worktrees for every
  stable branch; unprivileged triggerability; direct stable-list search
  results due lore Anubis blocking.

**YES**

 drivers/nvme/host/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ac5a7d9781f58..2e8402513189d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4052,7 +4052,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
 	mutex_unlock(&ctrl->subsys->lock);
 
 #ifdef CONFIG_NVME_MULTIPATH
-	cancel_delayed_work(&head->remove_work);
+	if (cancel_delayed_work(&head->remove_work))
+		module_put(THIS_MODULE);
 #endif
 	return 0;
 
-- 
2.53.0




More information about the Linux-nvme mailing list