[PATCH] nvme: restore use of blk_path_error() in nvme_complete_rq()

Thu Aug 6 15:19:43 EDT 2020

On Thu, Aug 06 2020 at  2:40pm -0400,
Mike Snitzer <snitzer at redhat.com> wrote:

> On Thu, Aug 06 2020 at 12:17pm -0400,
> Meneghini, John <John.Meneghini at netapp.com> wrote:
> 
> > On 8/6/20, 11:59 AM, "Meneghini, John" <John.Meneghini at netapp.com> wrote:
> > 
> >     Maybe translate to
> >     >> BLK_STS_IOERR is also not suitable, we should translate
> >     >> NVME_SC_CMD_INTERRUPTED to BLK_STS_AGAIN.
> > 
> > I think this depends upon what the error handling is up the stack for BLK_STS_IOERR.
> > 
> > What does DM do with BLK_STS_IOERR?
> 
> DM treats it as retryable.  See blk_path_error().
> 
> >     > BLK_STS_AGAIN is a bad choice as we use it for calls that block when
> >     > the callers asked for non-blocking submission.  I'm really not sure
> >     > we want to change anything here - the error definition clearly states
> >     > it is not a failure but a request to retry later.
> > 
> > So it sounds like you may need a new BLK_STS error.   However, even if you add
> > a new error, that's not going to be enough to communicate the CRDT or DNR 
> > information up the stack.
> >  
> > } blk_errors[] = {
> >         [BLK_STS_OK]            = { 0,          "" },
> >         [BLK_STS_NOTSUPP]       = { -EOPNOTSUPP, "operation not supported" },
> >         [BLK_STS_TIMEOUT]       = { -ETIMEDOUT, "timeout" },
> >         [BLK_STS_NOSPC]         = { -ENOSPC,    "critical space allocation" },
> >         [BLK_STS_TRANSPORT]     = { -ENOLINK,   "recoverable transport" },
> >         [BLK_STS_TARGET]        = { -EREMOTEIO, "critical target" },
> >         [BLK_STS_NEXUS]         = { -EBADE,     "critical nexus" },
> >         [BLK_STS_MEDIUM]        = { -ENODATA,   "critical medium" },
> >         [BLK_STS_PROTECTION]    = { -EILSEQ,    "protection" },
> >         [BLK_STS_RESOURCE]      = { -ENOMEM,    "kernel resource" },
> >         [BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },
> >         [BLK_STS_AGAIN]         = { -EAGAIN,    "nonblocking retry" },
> > 
> >         /* device mapper special case, should not leak out: */
> >         [BLK_STS_DM_REQUEUE]    = { -EREMCHG, "dm internal retry" },
> > 
> >         /* everything else not covered above: */
> >         [BLK_STS_IOERR]         = { -EIO,       "I/O" },
> > };
> > 
> 
> We've yet to determine how important it is that the target provided
> delay information be honored...
> 
> In any case, NVMe translating NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET
> is definitely wrong.  That conveys the error is not retryable (see
> blk_path_error()).
> 
> Shouldn't NVMe translate NVME_SC_CMD_INTERRUPTED to BLK_STS_RESOURCE or
> BLK_STS_DEV_RESOURCE?
> 
> DM will retry immediately if BLK_STS_RESOURCE is returned.
> DM will delay a fixed 100ms if BLK_STS_DEV_RESOURCE is used.

Ngh, I got that inverted.. BLK_STS_RESOURCE will result in the 100ms
delayed retry.  BLK_STS_DEV_RESOURCE results in immediate retry.

But going back to BLK_STS_IOERR by reverting commit 35038bffa87 would
work too.

> (Ming said BLK_STS_RESOURCE isn't Linux [block core] specific and can
> be used by drivers)

Regardless, reading back on this thread, I think there is at least some
consensus about reverting commit 35038bffa87 ("nvme: Translate more
status codes to blk_status_t") ?

And on a related note, building on the thread I started here (but
haven't heard back from any NVMe maintainers on):
https://www.redhat.com/archives/dm-devel/2020-July/msg00051.html
I'd also be happy as a pig in shit if this patch were applied:

From: Mike Snitzer <snitzer at redhat.com>
Date: Thu, 2 Jul 2020 01:43:27 -0400
Subject: [PATCH] nvme: restore use of blk_path_error() in nvme_complete_rq()

Commit 764e9332098c0 ("nvme-multipath: do not reset on unknown
status") removed NVMe's use blk_path_error() -- presummably because
nvme_failover_req() was modified to return whether a command should be
retried or not.

By not using blk_path_error() there is serious potential for
regression for how upper layers (e.g. DM multipath) respond to NVMe's
error conditions.  This has played out now due to commit 35038bffa87
("nvme: Translate more status codes to blk_status_t").  Had NVMe
continued to use blk_path_error() it too would not have retried an
NVMe command that got NVME_SC_CMD_INTERRUPTED.

Fix this potential for NVMe error handling regression, possibly
outside NVMe, by restoring NVMe's use of blk_path_error().

Fixes: 764e9332098c0 ("nvme-multipath: do not reset on unknown status")
Cc: stable at vger.kerneel.org
Signed-off-by: Mike Snitzer <snitzer at redhat.com>
---
 drivers/nvme/host/core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6585d57112ad..072f629da4d8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -290,8 +290,13 @@ void nvme_complete_rq(struct request *req)
 		nvme_req(req)->ctrl->comp_seen = true;
 
 	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
-		if ((req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req))
-			return;
+		if (blk_path_error(status)) {
+			if (req->cmd_flags & REQ_NVME_MPATH) {
+				if (nvme_failover_req(req))
+					return;
+				/* fallthru to normal error handling */
+			}
+		}
 
 		if (!blk_queue_dying(req->q)) {
 			nvme_retry_req(req);
-- 
2.18.0