[PATCH] nvme-core: fix io interrupt when work with dm-multipah

Thu Aug 6 11:59:41 EDT 2020

Thanks for cc'ing me on this thread.   The email from this DL usually goes into a folder
so I don't watch it too closely.  Cc me directly if you need my attention. That way the
email shows up at the top of my inbox.

>> On 8/5/20, 2:40 AM, "Chao Leng" <lengchao at huawei.com> wrote:

>> John, Wat do you think about if delete translate NVME_SC_CMD_INTERRUPTED
>> to BLK_STS_TARGET? Thank you.

I think returning to BLK_STS_TARGET for NVME_SC_CMD_INTERRUPTED and NVME_SC_NS_NOT_READY
was a potentially non-backwards compatible change that Keith made in response to a problem that
I reported in patch 35038bffa87da.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=35038bffa87da

I don’t think it would be a mistake to back out this change.  We've had a number of problems
in the distributions with dm-mp and other legacy code getting confused by this change.

>> BLK_STS_TARGET means target has critical error. NVME_SC_CMD_INTERRUPTED
>> just means target need retry io. It is not suitable to translate
>> NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET. Maybe translate to
>> BLK_STS_IOERR is also not suitable, we should translate
>> NVME_SC_CMD_INTERRUPTED to BLK_STS_AGAIN.
>> We can do like this:
>
> BLK_STS_AGAIN is a bad choice as we use it for calls that block when
> the callers asked for non-blocking submission.  I'm really not sure
> we want to change anything here - the error definition clearly states
> it is not a failure but a request to retry later.

Agreed. The historical behavior in nvme-core has been to return BLK_STS_IOERR status for all new
or unknown NVMe errors. 

.   On 8/6/20, 10:26 AM, "Keith Busch" <kbusch at kernel.org> wrote:

    > On Thu, Aug 06, 2020 at 01:52:42PM +0800, Chao Leng wrote:
    > NVME_SC_LBA_RANGE should failfast, because retry can not success.

    The DNR bit is how NVMe conveys such things.

I think that NVME_SC_LBA_RANGE has historically returned BLK_STS_TARGET so that
shouldn't be changed.  I agree that DNR - and now the new ACRE mechanism - is what
should control the command retry behavior in NVMe.  

    > NVME_SC_NS_NOT_READY may retry success, but the probality is very low.
    > NVME_SC_CMD_INTERRUPTED need retry, according to protocol define, retry will success.

    If the upper layers set a request such that "noretry" is true, that's
    the behavior you're going to get. Sprinkling special exceptions around
    is not a good idea.

I agree. I think the problem here is that the current BLK_STS and FAST_FAIL mechanisms
were designed support legacy protocols like SCSI.  They assume that all retry behavior is
controlled by other components in the stack.  NVMe is presenting new protocol features
and semantics which probably can't be effectively supported by those legacy BLK_STS
and FAST_FAIL mechanisms without passing more information up the stack.

So I think the best approach is to maintain backwards compatibility with
the existing BLK_STS codes and semantics or create new errors and mechanisms
which will require new software to handle those new error codes all the way
up the stack.   But don't overload things.

John Meneghini
ONTAP SAN Target Architect
johnm at netapp.com