[PATCHv3 3/5] nvme: implement I/O Command Sets Command Set support

Keith Busch kbusch at kernel.org
Wed Jun 24 13:25:09 EDT 2020


On Tue, Jun 23, 2020 at 04:17:30PM -0700, Sagi Grimberg wrote:
> 
> > > > > -		len = nvme_process_ns_desc(ctrl, ids, cur);
> > > > > +		len = nvme_process_ns_desc(ctrl, ids, cur, &csi_seen);
> > > > >    		if (len < 0)
> > > > >    			goto free_data;
> > > > >    		len += sizeof(*cur);
> > > > >    	}
> > > > >    free_data:
> > > > > +	if (!status && nvme_multi_css(ctrl) && !csi_seen) {
> > > > 
> > > > We will clear the status if we detect a path error, that is to
> > > > avoid needlessly removing the ns for path failures, so you should
> > > > check at the goto site.
> > > 
> > > The problem is that this check has to be done after checking all the ns descs,
> > > so this check to be done as the final thing, at least after processing all the
> > > ns descs. No matter if nvme_process_ns_desc() returned an error, or if
> > > simply NVME_NIDT_CSI wasn't part of the ns desc list, so the loop reached the
> > > end without error.
> > > 
> > > Even if the nvme command failed and the status was cleared:
> > > 
> > >                  if (status > 0 && !(status & NVME_SC_DNR))
> > >                          status = 0;
> > 
> > This check is so weird. What has DNR got to do with whether or not we
> > want to continue with this namespace? The commit that adds this says
> > it's to check for a host failed IO, but a controller can just as validly
> > set DNR in its error status, in which case we'd still want clear the
> > status.
> 
> The reason is that if this error is not a DNR error, it means that we
> should retry and succeed, we don't want to remove the namespace.

And what if it is a DNR error? For example, the controller simply
doesn't support this CNS value. While the controller should support it,
we definitely don't need it for NVM command set namespaces.
 
> The problem that this solves is the fact that we have various error
> recovery conditions (interconnect issues, failures, resets) and the
> async works are designed to continue to run, issue I/O and fail. The
> scan work, will revalidate namespaces and remove them if it fails to
> do so, which is inherently wrong if these are simply inaccessible by
> the host at this time.

Relying on a specific bit in the status code seems a bit indirect for
this kind of condition.



More information about the Linux-nvme mailing list