[PATCH] nvme-core: fix io interrupt when work with dm-multipah
Sagi Grimberg
sagi at grimberg.me
Thu Aug 6 20:03:30 EDT 2020
>>> John, Wat do you think about if delete translate NVME_SC_CMD_INTERRUPTED
>>> to BLK_STS_TARGET? Thank you.
>
> I think returning to BLK_STS_TARGET for NVME_SC_CMD_INTERRUPTED and NVME_SC_NS_NOT_READY
> was a potentially non-backwards compatible change that Keith made in response to a problem that
> I reported in patch 35038bffa87da.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=35038bffa87da
>
> I don’t think it would be a mistake to back out this change. We've had a number of problems
> in the distributions with dm-mp and other legacy code getting confused by this change.
From what I see the point of this change was to have these status codes
to do exactly what they need to do, use the local retry flow.
>>> BLK_STS_TARGET means target has critical error. NVME_SC_CMD_INTERRUPTED
>>> just means target need retry io. It is not suitable to translate
>>> NVME_SC_CMD_INTERRUPTED to BLK_STS_TARGET. Maybe translate to
>>> BLK_STS_IOERR is also not suitable, we should translate
>>> NVME_SC_CMD_INTERRUPTED to BLK_STS_AGAIN.
>>> We can do like this:
>>
>> BLK_STS_AGAIN is a bad choice as we use it for calls that block when
>> the callers asked for non-blocking submission. I'm really not sure
>> we want to change anything here - the error definition clearly states
>> it is not a failure but a request to retry later.
>
> Agreed. The historical behavior in nvme-core has been to return BLK_STS_IOERR status for all new
> or unknown NVMe errors.
Which is the correct thing to do.
> . On 8/6/20, 10:26 AM, "Keith Busch" <kbusch at kernel.org> wrote:
>
> > On Thu, Aug 06, 2020 at 01:52:42PM +0800, Chao Leng wrote:
> > NVME_SC_LBA_RANGE should failfast, because retry can not success.
>
> The DNR bit is how NVMe conveys such things.
>
> I think that NVME_SC_LBA_RANGE has historically returned BLK_STS_TARGET so that
> shouldn't be changed. I agree that DNR - and now the new ACRE mechanism - is what
> should control the command retry behavior in NVMe.
Completely agree here too.
> > NVME_SC_NS_NOT_READY may retry success, but the probality is very low.
> > NVME_SC_CMD_INTERRUPTED need retry, according to protocol define, retry will success.
>
> If the upper layers set a request such that "noretry" is true, that's
> the behavior you're going to get. Sprinkling special exceptions around
> is not a good idea.
>
> I agree.
I also 100% agree here.
> I think the problem here is that the current BLK_STS and FAST_FAIL mechanisms
> were designed support legacy protocols like SCSI. They assume that all retry behavior is
> controlled by other components in the stack. NVMe is presenting new protocol features
> and semantics which probably can't be effectively supported by those legacy BLK_STS
> and FAST_FAIL mechanisms without passing more information up the stack.
Not sure how generic this new blk status would be.. It would probably
make a lot more sense of there are other consumers for such a status
code.
Maybe we could set it to BLK_STS_TIMEOUT with a big fat comment for why
we are doing this...
More information about the Linux-nvme
mailing list