[PATCH v2 12/14] nvme-fc: Decouple error recovery from controller reset
James Smart
jsmart833426 at gmail.com
Tue Feb 3 14:49:01 PST 2026
On 2/3/2026 11:19 AM, James Smart wrote:
> On 1/30/2026 2:34 PM, Mohamed Khalfella wrote:
...
>> static void
>> nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
>> {
>> @@ -2049,9 +2061,8 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
>> nvme_fc_complete_rq(rq);
>> check_error:
>> - if (terminate_assoc &&
>> - nvme_ctrl_state(&ctrl->ctrl) != NVME_CTRL_RESETTING)
>> - queue_work(nvme_reset_wq, &ctrl->ioerr_work);
>> + if (terminate_assoc)
>> + nvme_fc_start_ioerr_recovery(ctrl, "io error");
>
> this is ok. the ioerr_recovery will bounce the RESETTING state if it's
> already in the state. So this is a little cleaner.a
What is problematic here is - if the start_ioerr path includes the
CONNECTING logic that terminates i/o's, it's running in the LLDD's
context that called this iodone routine. Not good. In existing code, the
LLDD context was swapped to the work queue where error_recovery was called.
>
>> }
>> static int
>> @@ -2495,39 +2506,6 @@ __nvme_fc_abort_outstanding_ios(struct
>> nvme_fc_ctrl *ctrl, bool start_queues)
>> nvme_unquiesce_admin_queue(&ctrl->ctrl);
>> }
>> -static void
>> -nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
>> -{
>> - enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);
>> -
>> - /*
>> - * if an error (io timeout, etc) while (re)connecting, the remote
>> - * port requested terminating of the association (disconnect_ls)
>> - * or an error (timeout or abort) occurred on an io while creating
>> - * the controller. Abort any ios on the association and let the
>> - * create_association error path resolve things.
>> - */
>> - if (state == NVME_CTRL_CONNECTING) {
>> - __nvme_fc_abort_outstanding_ios(ctrl, true);
>> - dev_warn(ctrl->ctrl.device,
>> - "NVME-FC{%d}: transport error during (re)connect\n",
>> - ctrl->cnum);
>> - return;
>> - }
>
> This logic needs to be preserved. Its no longer part of
> nvme_fc_start_ioerr_recovery(). Failures during CONNECTING should not be
> "fenced". They should fail immediately.
this logic, if left in start_ioerr_recovery
-- james
More information about the Linux-nvme
mailing list