[PATCH v3] nvme: fix identify error status silent ignore

Sagi Grimberg sagi at grimberg.me
Fri Jun 26 13:46:29 EDT 2020


Patch 59c7c3caaaf8 intended to only silently ignore
non retry-able errors (DNR bit set) such that we can still
identify misbehaving controllers, and in the other hand
propagate retry-able errors (DNR bit cleared) so we don't
wrongly abandon a namespace just because it happens to be
temporarily inaccessible.

The goal remains the same as the original commit where this
was introduced but unfortunately had the logic backwards.

Fixes: 59c7c3caaaf8 ("nvme: fix possible hang when ns
scanning fails during error recovery")
Reported-by: Keith Busch <kbusch at kernel.org>
Reviewed-by: Keith Busch <kbusch at kernel.org>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
Changes from v2:
- added comment on non-trivial code

Changes from v1:
- remove paranthesis

 drivers/nvme/host/core.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2afed32d3892..92dc2327bf3a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1128,9 +1128,15 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid,
 			"Identify Descriptors failed (%d)\n", status);
 		 /*
 		  * Don't treat an error as fatal, as we potentially already
-		  * have a NGUID or EUI-64.
+		  * have a NGUID or EUI-64. If we failed with DNR set, we want
+		  * to silently ignore the error as we can still identify
+		  * the device, but if the status has DNR set, we want
+		  * to propogate the error back specifically for the disk
+		  * revalidation flow to make sure we don't abandon the
+		  * device just because of a temporal retry-able error (such
+		  * as path of transport errors).
 		  */
-		if (status > 0 && !(status & NVME_SC_DNR))
+		if (status > 0 && status & NVME_SC_DNR)
 			status = 0;
 		goto free_data;
 	}
-- 
2.25.1




More information about the Linux-nvme mailing list