[PATCHv3 3/5] nvme: implement I/O Command Sets Command Set support
Keith Busch
kbusch at kernel.org
Wed Jun 24 17:49:16 EDT 2020
On Wed, Jun 24, 2020 at 12:03:41PM -0700, Sagi Grimberg wrote:
>
> > > > > If the controller does not support the CNS value, it may return Invalid
> > > > > Field with DNR set. That error currently gets propogated back to
> > > > > nvme_init_ns_head(), which then abandons the namespace. Just as the code
> > > > > coments say, we had been historically been clearing such errors because
> > > > > we have other ways to identify the namespace, but now we're not clearing
> > > > > that error.
> > > >
> > > > I don't understand what you are saying Keith.
> > > >
> > > > My comment was for this block:
> > > > --
> > > > if (!status && nvme_multi_css(ctrl) && !csi_seen) {
> > > > dev_warn(ctrl->device, "Command set not reported for nsid:%d\n",
> > > > nsid);
> > > > status = -EINVAL;
> > > > }
> > > > --
> > > >
> > > > I was saying that !status doesn't necessarily mean success, but it can
> > > > also mean that its an retry-able error status (due to transport or
> > > > controller). If we see a retry-able error we should still clear the
> > > > status so we don't abandon the namespace.
> > > >
> > > > This for example would achieve that:
> >
> > We're not talking about the same thing. I am only talking about what
> > introduced the DNR check, from commit 59c7c3caaaf87.
> >
> > I know you added it because you want to abort comparing identifiers on a
> > rescan when retrieving the identifiers failed. That's fine, but I am
> > saying this fails namespace creation in the first place for some types
> > of devices that used to succeed.
>
> OK, now I think I understand (thanks for clarifying that the discussion
> is not on patch 3/5, but rather on 59c7c3caaaf87).
>
> So the original proposal was to check NVME_SC_HOST_PATH_ERROR (and now
> we have NVME_SC_HOST_ABORTED_CMD) but with the review Christoph gave
> it seemed to make more sense that we generalize and check the DNR bit.
Okay, I didn't question this approach when it first went through, so
sorry about this digression, but I really don't get how this DNR check
helps at all.
The code currently returns an error if DNR is set. Based on the commit
messages, it sounds like you need that error to skip comparing
identifiers through nvme's scan_work calling revalidate_disk():
suppressing the error has revalidate_disk() fail with -ENODEV when
comparing identifiers fails.
Since we do return the error when DNR is set, we skip comparing
identifiers and return blk_status_to_errno(nvme_error_status(ret))
instead. How is that errno an improvement?
And then if DNR is not set, we suppress the error and proceed with
comparing identifiers??
More information about the Linux-nvme
mailing list