[PATCH] nvme-core: fix io interrupt when work with dm-multipah

Thu Aug 6 20:03:30 EDT 2020

>>> John, Wat do you think about if delete translate NVME_SC_CMD_INTERRUPTED
>>> to BLK_STS_TARGET? Thank you.
> 
> I think returning to BLK_STS_TARGET for NVME_SC_CMD_INTERRUPTED and NVME_SC_NS_NOT_READY
> was a potentially non-backwards compatible change that Keith made in response to a problem that
> I reported in patch 35038bffa87da.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=35038bffa87da
> 
> I don’t think it would be a mistake to back out this change.  We've had a number of problems
> in the distributions with dm-mp and other legacy code getting confused by this change.

 From what I see the point of this change was to have these status codes
to do exactly what they need to do, use the local retry flow.

>>> BLK_STS_TARGET means target has critical error. NVME_SC_CMD_INTERRUPTED
>>> just means target need retry io. It is not suitable to translate
>>> NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET. Maybe translate to
>>> BLK_STS_IOERR is also not suitable, we should translate
>>> NVME_SC_CMD_INTERRUPTED to BLK_STS_AGAIN.
>>> We can do like this:
>>
>> BLK_STS_AGAIN is a bad choice as we use it for calls that block when
>> the callers asked for non-blocking submission.  I'm really not sure
>> we want to change anything here - the error definition clearly states
>> it is not a failure but a request to retry later.
> 
> Agreed. The historical behavior in nvme-core has been to return BLK_STS_IOERR status for all new
> or unknown NVMe errors.

Which is the correct thing to do.

> .   On 8/6/20, 10:26 AM, "Keith Busch" <kbusch at kernel.org> wrote:
> 
>      > On Thu, Aug 06, 2020 at 01:52:42PM +0800, Chao Leng wrote:
>      > NVME_SC_LBA_RANGE should failfast, because retry can not success.
> 
>      The DNR bit is how NVMe conveys such things.
> 
> I think that NVME_SC_LBA_RANGE has historically returned BLK_STS_TARGET so that
> shouldn't be changed.  I agree that DNR - and now the new ACRE mechanism - is what
> should control the command retry behavior in NVMe.

Completely agree here too.

>      > NVME_SC_NS_NOT_READY may retry success, but the probality is very low.
>      > NVME_SC_CMD_INTERRUPTED need retry, according to protocol define, retry will success.
> 
>      If the upper layers set a request such that "noretry" is true, that's
>      the behavior you're going to get. Sprinkling special exceptions around
>      is not a good idea.
> 
> I agree.

I also 100% agree here.

> I think the problem here is that the current BLK_STS and FAST_FAIL mechanisms
> were designed support legacy protocols like SCSI.  They assume that all retry behavior is
> controlled by other components in the stack.  NVMe is presenting new protocol features
> and semantics which probably can't be effectively supported by those legacy BLK_STS
> and FAST_FAIL mechanisms without passing more information up the stack.

Not sure how generic this new blk status would be.. It would probably
make a lot more sense of there are other consumers for such a status
code.

Maybe we could set it to BLK_STS_TIMEOUT with a big fat comment for why
we are doing this...