nvme-core: fix io interrupt when work with dm-multipah

Thu Aug 6 14:40:58 EDT 2020

On Thu, Aug 06 2020 at 12:17pm -0400,
Meneghini, John <John.Meneghini at netapp.com> wrote:

> On 8/6/20, 11:59 AM, "Meneghini, John" <John.Meneghini at netapp.com> wrote:
> 
>     Maybe translate to
>     >> BLK_STS_IOERR is also not suitable, we should translate
>     >> NVME_SC_CMD_INTERRUPTED to BLK_STS_AGAIN.
> 
> I think this depends upon what the error handling is up the stack for BLK_STS_IOERR.
> 
> What does DM do with BLK_STS_IOERR?

DM treats it as retryable.  See blk_path_error().

>     > BLK_STS_AGAIN is a bad choice as we use it for calls that block when
>     > the callers asked for non-blocking submission.  I'm really not sure
>     > we want to change anything here - the error definition clearly states
>     > it is not a failure but a request to retry later.
> 
> So it sounds like you may need a new BLK_STS error.   However, even if you add
> a new error, that's not going to be enough to communicate the CRDT or DNR 
> information up the stack.
>  
> } blk_errors[] = {
>         [BLK_STS_OK]            = { 0,          "" },
>         [BLK_STS_NOTSUPP]       = { -EOPNOTSUPP, "operation not supported" },
>         [BLK_STS_TIMEOUT]       = { -ETIMEDOUT, "timeout" },
>         [BLK_STS_NOSPC]         = { -ENOSPC,    "critical space allocation" },
>         [BLK_STS_TRANSPORT]     = { -ENOLINK,   "recoverable transport" },
>         [BLK_STS_TARGET]        = { -EREMOTEIO, "critical target" },
>         [BLK_STS_NEXUS]         = { -EBADE,     "critical nexus" },
>         [BLK_STS_MEDIUM]        = { -ENODATA,   "critical medium" },
>         [BLK_STS_PROTECTION]    = { -EILSEQ,    "protection" },
>         [BLK_STS_RESOURCE]      = { -ENOMEM,    "kernel resource" },
>         [BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },
>         [BLK_STS_AGAIN]         = { -EAGAIN,    "nonblocking retry" },
> 
>         /* device mapper special case, should not leak out: */
>         [BLK_STS_DM_REQUEUE]    = { -EREMCHG, "dm internal retry" },
> 
>         /* everything else not covered above: */
>         [BLK_STS_IOERR]         = { -EIO,       "I/O" },
> };
> 

We've yet to determine how important it is that the target provided
delay information be honored...

In any case, NVMe translating NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET
is definitely wrong.  That conveys the error is not retryable (see
blk_path_error()).

Shouldn't NVMe translate NVME_SC_CMD_INTERRUPTED to BLK_STS_RESOURCE or
BLK_STS_DEV_RESOURCE?

DM will retry immediately if BLK_STS_RESOURCE is returned.
DM will delay a fixed 100ms if BLK_STS_DEV_RESOURCE is used.

(Ming said BLK_STS_RESOURCE isn't Linux specific and can be used by
drivers)