[PATCH AUTOSEL 6.18-6.12] nvme-fabrics: add ENOKEY to no retry criteria for authentication failures

Sasha Levin sashal at kernel.org
Thu Dec 11 18:08:58 PST 2025


From: Justin Tee <justintee8345 at gmail.com>

[ Upstream commit 13989207ee29c40501e719512e8dc90768325895 ]

With authentication, in addition to EKEYREJECTED there is also no point in
retrying reconnects when status is ENOKEY.  Thus, add -ENOKEY as another
criteria to determine when to stop retries.

Cc: Daniel Wagner <wagi at kernel.org>
Cc: Hannes Reinecke <hare at suse.de>
Closes: https://lore.kernel.org/linux-nvme/20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org/
Signed-off-by: Justin Tee <justintee8345 at gmail.com>
Tested-by: Daniel Wagner <wagi at kernel.org>
Reviewed-by: Daniel Wagner <wagi at kernel.org>
Reviewed-by: Hannes Reinecke <hare at suse.de>
Signed-off-by: Keith Busch <kbusch at kernel.org>
Signed-off-by: Sasha Levin <sashal at kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Comprehensive Analysis

### 1. COMMIT MESSAGE ANALYSIS
- **Purpose**: Add `-ENOKEY` to prevent pointless reconnect retries when
  authentication key is missing
- **Tags**: `Tested-by` (Daniel Wagner), `Reviewed-by` (Daniel Wagner,
  Hannes Reinecke), `Closes:` (lore link)
- **Missing**: No explicit `Cc: stable at vger.kernel.org` or `Fixes:` tag
- **Maintainer signoff**: Keith Busch (NVMe maintainer)

### 2. CODE CHANGE ANALYSIS

The change is minimal - single line modification:
```c
- if (status == -EKEYREJECTED)
+       if (status == -EKEYREJECTED || status == -ENOKEY)
```

**Where `-ENOKEY` is returned:**
- `drivers/nvme/host/auth.c:720` - No session key negotiated
- `drivers/nvme/host/auth.c:973` - No host key (`ctrl->host_key` is
  NULL)
- `drivers/nvme/host/auth.c:978` - Controller key configured but invalid
- `drivers/nvme/host/tcp.c:1698,2080,2112,2121` - Various TLS/PSK key
  failures

All these represent "key does not exist" scenarios where retrying cannot
help.

**Function impact:** `nvmf_should_reconnect()` is called by all three
NVMe fabric transports (TCP, FC, RDMA) via
`nvme_tcp_reconnect_or_remove()`, `nvme_fc_reconnect_or_delete()`, and
`nvme_rdma_reconnect_or_remove()`.

### 3. CLASSIFICATION
- **Bug fix**: Yes - fixes futile retry behavior
- **New feature**: No - extends existing error handling pattern
- **Follows established pattern**: The `-EKEYREJECTED` check was added
  in v6.10 (commit 0e34bd9605f6c) with identical logic

### 4. SCOPE AND RISK ASSESSMENT
- **Lines changed**: 1
- **Files touched**: 1
- **Complexity**: Trivial
- **Risk**: Extremely low - change only affects reconnect decision for
  an already-failed authentication
- **Regression potential**: Near zero - the code path only executes when
  authentication already failed

### 5. USER IMPACT
- **Who is affected**: Users of NVMe Fabrics (TCP/RDMA/FC) with
  authentication enabled
- **Severity without fix**: Wasteful reconnect retries, potential log
  spam, resource consumption
- **Not a crash/data corruption**: This is a behavioral improvement, not
  a critical fix

### 6. STABILITY INDICATORS
- Tested by Daniel Wagner (NVMe developer)
- Reviewed by Daniel Wagner and Hannes Reinecke (both storage/NVMe
  experts)
- Clean, simple change with clear semantics

### 7. DEPENDENCY CHECK
- Requires commit `0e34bd9605f6c` ("nvme: do not retry authentication
  failures") from v6.10
- NVMe authentication feature itself was added in v6.1 (`f50fff73d620c`)
- Backport applies cleanly to trees with the `-EKEYREJECTED` check

### Decision Rationale

**Pros for backporting:**
- Trivial one-line change with zero regression risk
- Fixes real wasteful behavior (pointless retries that can never
  succeed)
- Follows existing code pattern already established
- Reviewed and tested by domain experts
- Semantically correct: `-ENOKEY` means "no key available" - retry won't
  create one

**Cons for backporting:**
- No explicit `Cc: stable at vger.kernel.org` tag from maintainers
- Not a crash, security bug, or data corruption fix
- NVMe authentication is a relatively niche feature
- Bug impact is resource waste, not functional failure

**Conclusion:**
This is a low-risk, obviously correct bug fix that prevents wasteful
behavior. While it lacks explicit stable tags and isn't fixing a
critical bug, the change is so simple and safe that the benefit-to-risk
ratio strongly favors inclusion. The fix completes the authentication
error handling that was started with the `-EKEYREJECTED` check, making
it a natural complement to that existing code. Stable kernel users with
NVMe authentication would benefit from not having pointless reconnection
storms when their keys are missing.

**YES**

 drivers/nvme/host/fabrics.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 2e58a7ce10905..55a8afd2efd50 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -592,7 +592,7 @@ bool nvmf_should_reconnect(struct nvme_ctrl *ctrl, int status)
 	if (status > 0 && (status & NVME_STATUS_DNR))
 		return false;
 
-	if (status == -EKEYREJECTED)
+	if (status == -EKEYREJECTED || status == -ENOKEY)
 		return false;
 
 	if (ctrl->opts->max_reconnects == -1 ||
-- 
2.51.0




More information about the Linux-nvme mailing list