[PATCH v4 08/15] nvme: Implement cross-controller reset recovery

Randy Jennings randyj at purestorage.com
Fri Apr 24 16:07:55 PDT 2026


On Fri, Mar 27, 2026 at 5:46 PM Mohamed Khalfella
<mkhalfella at purestorage.com> wrote:
> Signed-off-by: Mohamed Khalfella <mkhalfella at purestorage.com>

> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c

> +int nvme_fence_ctrl(struct nvme_ctrl *ictrl)
> +{
> +       unsigned long deadline, timeout;
> +       struct nvme_ctrl *sctrl;
> +       u32 min_cntlid = 0;
> +       int ret;
> +
> +       timeout = nvme_fence_timeout_ms(ictrl);
> +       dev_info(ictrl->device, "attempting CCR, timeout %lums\n", timeout);
> +
> +       deadline = jiffies + msecs_to_jiffies(timeout);
> +       while (time_is_after_jiffies(deadline)) {
> +               sctrl = nvme_find_ctrl_ccr(ictrl, min_cntlid);
> +               if (!sctrl) {
> +                       dev_dbg(ictrl->device,
> +                               "failed to find source controller\n");
> +                       return -EIO;
> +               }
> +
> +               ret = nvme_issue_wait_ccr(sctrl, ictrl, deadline);
> +               if (!ret) {
> +                       dev_info(ictrl->device, "CCR succeeded using %s\n",
> +                                dev_name(sctrl->device));
> +                       nvme_put_ctrl_ccr(sctrl);
> +                       return 0;
> +               }
> +
> +               min_cntlid = sctrl->cntlid + 1;
> +               nvme_put_ctrl_ccr(sctrl);
> +

If we remove this code from here
> +               if (ret == -EIO) /* CCR command failed */
> +                       continue;
> +
> +               /* CCR operation failed or timed out */
> +               return ret;
to here, failed CCR operations (not just failed CCR cmds)
will get retried (until we run out of ctrls or time).  This is
important if controllers cannot handle a CCR for some
other controllers.  Sagi, you requested that we not retry
the CCR operation on another controller, and I told you
that was affecting Igor's and my testing.  May we please
remove this code?

> +       }
> +
> +       dev_info(ictrl->device, "CCR operation timeout\n");
> +       return -ETIMEDOUT;
> +}

Sincerely,
Randy Jennings



More information about the Linux-nvme mailing list