[RFC PATCH 08/14] nvme: Implement cross-controller reset recovery

Sagi Grimberg sagi at grimberg.me
Sun Jan 4 13:14:38 PST 2026



On 31/12/2025 2:04, Randy Jennings wrote:
>>> +
>>> +             if (!ret) {
>>> +                     dev_info(ictrl->device, "CCR succeeded using %s\n",
>>> +                              dev_name(sctrl->device));
>>> +                     blk_put_queue(sctrl->admin_q);
>>> +                     nvme_put_ctrl(sctrl);
>>> +                     return 0;
>>> +             }
>>> +
>>> +             /* Try another controller */
>>> +             min_cntlid = sctrl->cntlid + 1;
>> OK, I see why min_cntlid is used. That is very non-intuitive.
>>
>> I'm wandering if it will be simpler to take one-shot at ccr and
>> if it fails fallback to crt. I mean, if the sctrl is alive, and it was
>> unable
>> to reset the ictrl in time, how would another ctrl do a better job here?
> There are many different kinds of failures we are dealing with here
> that result in a dropped connection (association).  It could be a problem
> with the specific link, or it could be that the node of an HA pair in the
> storage array went down.  In the case of a specific link problem, maybe
> only one of the connections is down and any controller would work.
> In the case of the node of an HA pair, roughly half of the connections
> are going down, and there is a race between the controllers which
> are detected down first.  There were some heuristics put into the
> spec about deciding which controller to use, but that is more code
> and a refinement that could come later (and they are still heuristics;
> they may not be helpful).
>
> Because CCR offers a significant win of shortening the recovery time
> substantially, it is worth retrying on the other controllers. This time
> affects when we can start retrying IO.  KATO is in seconds, and
> NVMEoF should have the capability of doing a significant amount of
> IOs in each of those seconds.

But it doesn't actually do I/O, it issues I/O and then wait for it to 
time out.

>
> Besides, the alternative is just to wait.  Might as well be actively trying
> to shorten that wait time.  Besides a small increase in code complexity,
> is there a downside to doing so?

Simplicity is very important when it comes to non-trivial code paths 
like error recovery.



More information about the Linux-nvme mailing list