[PATCHv3 3/5] nvme: implement I/O Command Sets Command Set support
Sagi Grimberg
sagi at grimberg.me
Tue Jun 23 19:17:30 EDT 2020
>>>> - len = nvme_process_ns_desc(ctrl, ids, cur);
>>>> + len = nvme_process_ns_desc(ctrl, ids, cur, &csi_seen);
>>>> if (len < 0)
>>>> goto free_data;
>>>> len += sizeof(*cur);
>>>> }
>>>> free_data:
>>>> + if (!status && nvme_multi_css(ctrl) && !csi_seen) {
>>>
>>> We will clear the status if we detect a path error, that is to
>>> avoid needlessly removing the ns for path failures, so you should
>>> check at the goto site.
>>
>> The problem is that this check has to be done after checking all the ns descs,
>> so this check to be done as the final thing, at least after processing all the
>> ns descs. No matter if nvme_process_ns_desc() returned an error, or if
>> simply NVME_NIDT_CSI wasn't part of the ns desc list, so the loop reached the
>> end without error.
>>
>> Even if the nvme command failed and the status was cleared:
>>
>> if (status > 0 && !(status & NVME_SC_DNR))
>> status = 0;
>
> This check is so weird. What has DNR got to do with whether or not we
> want to continue with this namespace? The commit that adds this says
> it's to check for a host failed IO, but a controller can just as validly
> set DNR in its error status, in which case we'd still want clear the
> status.
The reason is that if this error is not a DNR error, it means that we
should retry and succeed, we don't want to remove the namespace.
The problem that this solves is the fact that we have various error
recovery conditions (interconnect issues, failures, resets) and the
async works are designed to continue to run, issue I/O and fail. The
scan work, will revalidate namespaces and remove them if it fails to
do so, which is inherently wrong if these are simply inaccessible by
the host at this time.
More information about the Linux-nvme
mailing list