[PATCH V3 2/2] nvme: Add support for FW activation without reset

Arnav Dawn a.dawn at samsung.com
Sun Jun 25 02:19:22 PDT 2017



On Friday 23 June 2017 08:55 PM, Keith Busch wrote:
> On Fri, Jun 23, 2017 at 05:41:21PM +0530, Arnav Dawn wrote:
>> On Wednesday 21 June 2017 07:07 PM, Keith Busch wrote:
>>> One last thing I noticed, it looks like the nvme_fw_act_work will break
>>> if an IO timeout occurs while it's running since that may reset the
>>> controller and restart IO queues.
>> could you please clarify more?
>> I think ,that, if an IO Timeout resets controller and restarts queue,
>> the nvme_ctrl_pp_status will return false, ending the nvme_fw_act_work.
> Sure, the nvme_fw_act_work will observe CSTS.PP cleared, then it restarts
> the IO queues, but the nvme reset requires those queues to be quiesced.

Thanks for clarifying,
i have made following changes based on comments, please review.
 
1. Added a delay of 100ms for scheduling fw_act_work,
as work without delay could schedule almost immediately which
may not give time to the controller to set CSTS.PP. 
Also, Moved fw_act_timeout calculation to the fw_act_work.

+       case NVME_AER_NOTICE_FW_ACT_STARTING:
+               schedule_delayed_work(&ctrl->fw_act_work,
msecs_to_jiffies(100));
+               break;
+       case NVME_AER_ERR_FW_IMG_LOAD:
+               dev_warn(ctrl->device, "FW image load error\n");
+               cancel_delayed_work(&ctrl->fw_act_work);
+               break;


2. Added a check, that the controller is in NVME_CTRL_LIVE state
before starting queues once PP is cleared or fw activation times out.
This prevents starting of queues unless the controller is in Live state.

+static void nvme_fw_act_work(struct work_struct *work)
+{
+       struct nvme_ctrl *ctrl = container_of(to_delayed_work(work),
+                               struct nvme_ctrl, fw_act_work);
+       struct nvme_fw_slot_info_log *log;
+       unsigned long fw_act_timeout;
+
+       if (ctrl->mtfa)
+               fw_act_timeout = jiffies +
+                               msecs_to_jiffies(ctrl->mtfa * 100);
+       else
+               fw_act_timeout = jiffies +
+                               msecs_to_jiffies(admin_timeout * 1000);
+
+       nvme_stop_queues(ctrl);
+       while (nvme_ctrl_pp_status(ctrl)) {
+               if (time_after(jiffies, fw_act_timeout)) {
+                       dev_warn(ctrl->device,
+                               "Fw activation timeout, reset
controller\n");
+                       ctrl->ops->reset_ctrl(ctrl);
+                       break;
+               }
+               msleep(100);
+       }
+
+       if(ctrl->state != NVME_CTRL_LIVE)
+               return;
+
+       nvme_start_queues(ctrl);
+       /* read FW slot informationi to clear the AER*/
+       log = kmalloc(sizeof(struct nvme_fw_slot_info_log), GFP_KERNEL);
+       if (!log)
+               return;
+
+       if (nvme_get_fw_slot_info(ctrl, log))
+               dev_warn(ctrl->device,
+                               "Get FW SLOT INFO log error\n");
+       kfree(log);
+}

if this is fine i will send out the next version of patch.

Regards
Arnav Dawn




More information about the Linux-nvme mailing list