[PATCH v4 08/15] nvme: Implement cross-controller reset recovery
Randy Jennings
randyj at purestorage.com
Fri Apr 24 16:07:55 PDT 2026
On Fri, Mar 27, 2026 at 5:46 PM Mohamed Khalfella
<mkhalfella at purestorage.com> wrote:
> Signed-off-by: Mohamed Khalfella <mkhalfella at purestorage.com>
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> +int nvme_fence_ctrl(struct nvme_ctrl *ictrl)
> +{
> + unsigned long deadline, timeout;
> + struct nvme_ctrl *sctrl;
> + u32 min_cntlid = 0;
> + int ret;
> +
> + timeout = nvme_fence_timeout_ms(ictrl);
> + dev_info(ictrl->device, "attempting CCR, timeout %lums\n", timeout);
> +
> + deadline = jiffies + msecs_to_jiffies(timeout);
> + while (time_is_after_jiffies(deadline)) {
> + sctrl = nvme_find_ctrl_ccr(ictrl, min_cntlid);
> + if (!sctrl) {
> + dev_dbg(ictrl->device,
> + "failed to find source controller\n");
> + return -EIO;
> + }
> +
> + ret = nvme_issue_wait_ccr(sctrl, ictrl, deadline);
> + if (!ret) {
> + dev_info(ictrl->device, "CCR succeeded using %s\n",
> + dev_name(sctrl->device));
> + nvme_put_ctrl_ccr(sctrl);
> + return 0;
> + }
> +
> + min_cntlid = sctrl->cntlid + 1;
> + nvme_put_ctrl_ccr(sctrl);
> +
If we remove this code from here
> + if (ret == -EIO) /* CCR command failed */
> + continue;
> +
> + /* CCR operation failed or timed out */
> + return ret;
to here, failed CCR operations (not just failed CCR cmds)
will get retried (until we run out of ctrls or time). This is
important if controllers cannot handle a CCR for some
other controllers. Sagi, you requested that we not retry
the CCR operation on another controller, and I told you
that was affecting Igor's and my testing. May we please
remove this code?
> + }
> +
> + dev_info(ictrl->device, "CCR operation timeout\n");
> + return -ETIMEDOUT;
> +}
Sincerely,
Randy Jennings
More information about the Linux-nvme
mailing list