[RFC PATCH 11/14] nvme-rdma: Use CCR to recover controller that hits an error

Thu Dec 18 18:16:04 PST 2025

On Tue, Nov 25, 2025 at 6:13 PM Mohamed Khalfella
<mkhalfella at purestorage.com> wrote:
>
> An alive nvme controller that hits an error will now move to RECOVERING
> state instead of RESETTING state. In RECOVERING state, ctrl->err_work
> will attempt to use cross-controller recovery to terminate inflight IOs
> on the controller. If CCR succeeds, then switch to RESETTING state and
> continue error recovery as usuall by tearing down the controller, and
> attempting reconnect to target. If CCR fails, the behavior of recovery
"usuall" -> "usual"
"attempt reconnecting" -> "attempting to reconnect"

it would read better with "the" added:
"reconnect to the target"

> depends on whether CQT is supported or not. If CQT is supported, switch
> to time-based recovery by holding inflight IOs until it is safe for them
> to be retried. If CQT is not supported proceed to retry requests
> immediately, as the code currently does.


> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 190a4cfa8a5e..4a8bb2614468 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c

> +static int nvme_rdma_recover_ctrl(struct nvme_ctrl *ctrl)

> +       queue_delayed_work(nvme_reset_wq, &to_rdma_ctrl(ctrl)->err_work, rem);
nvme_rdma_recover_ctrl is exactly the same as
nvme_tcp_recover_ctrl. Seems like a core.c function
nvme_recover_ctrl could take a delayed work queue,
unifying the code.

>  static void nvme_rdma_error_recovery_work(struct work_struct *work)
>  {
> -       struct nvme_rdma_ctrl *ctrl = container_of(work,
> +       struct nvme_rdma_ctrl *ctrl = container_of(to_delayed_work(work),
>                         struct nvme_rdma_ctrl, err_work);
>
> +       if (nvme_ctrl_state(&ctrl->ctrl) == NVME_CTRL_RECOVERING) {
> +               if (nvme_rdma_recover_ctrl(&ctrl->ctrl))
> +                       return;
> +       }
> +
>         nvme_stop_keep_alive(&ctrl->ctrl);
The state of the controller should not be LIVE while waiting for
recovery, so I do not think we will succeed in sending keep alives,
but I think this should move to before (or inside of)
nvme_tcp_recover_ctrl().

Sincerely,
Randy Jennings