[PATCH v3 08/21] nvme: Implement cross-controller reset recovery

Randy Jennings randyj at purestorage.com
Wed Feb 25 18:37:44 PST 2026


On Fri, Feb 13, 2026 at 8:28 PM Mohamed Khalfella
<mkhalfella at purestorage.com> wrote:
>
> A host that has more than one path connecting to an nvme subsystem
> typically has an nvme controller associated with every path. This is
> mostly applicable to nvmeof. If one path goes down, inflight IOs on that
> path should not be retried immediately on another path because this
> could lead to data corruption as described in TP4129. TP8028 defines
> cross-controller reset mechanism that can be used by host to terminate
> IOs on the failed path using one of the remaining healthy paths. Only
> after IOs are terminated, or long enough time passes as defined by
> TP4129, inflight IOs should be retried on another path. Implement core
> cross-controller reset shared logic to be used by the transports.
>
> Signed-off-by: Mohamed Khalfella <mkhalfella at purestorage.com>
> +static int nvme_issue_wait_ccr(struct nvme_ctrl *sctrl, struct nvme_ctrl *ictrl)
> +       if (!wait_for_completion_timeout(&ccr.complete, tmo)) {
> +               ret = -ETIMEDOUT;
> +               goto out;
> +       }
The more I look at this, the less I can ignore that this tmo should be
capped by deadline - now..

> +unsigned long nvme_fence_ctrl(struct nvme_ctrl *ictrl)
> +       deadline = now + msecs_to_jiffies(timeout);
> +       while (time_before(now, deadline)) {
...
> +               ret = nvme_issue_wait_ccr(sctrl, ictrl);
...
> +       }
Sincerely,
Randy Jennings



More information about the Linux-nvme mailing list