[PATCH v3 5/5] nvme_fc: add dev_loss_tmo timeout and remoteport resume support
James Smart
jsmart2021 at gmail.com
Fri Oct 20 11:53:07 PDT 2017
On 10/19/2017 11:27 PM, Hannes Reinecke wrote:
...
>> +static struct nvme_fc_rport *
>> +nvme_fc_attach_to_suspended_rport(struct nvme_fc_lport *lport,
>> + struct nvme_fc_port_info *pinfo)
>> +{
...
>> +
>> + /* has it been unregistered */
>> + if (rport->remoteport.port_state != FC_OBJSTATE_DELETED) {
>> + /* means lldd called us twice */
>> + spin_unlock_irqrestore(&rport->lock, flags);
>> + nvme_fc_rport_put(rport);
>> + return ERR_PTR(-ESTALE);
>> + }
>> +
> Why can't we re-use this rport?
> It's patently _not_ deleted, so we should be good to use it, right?
> Someone else might've messed up, but if _we_ get our reference counting
> right that shouldn't affect us, no?
> Plus: How would we recover from such a situation?
> The rport most likely will _never_ go away, meaning we'll always hit
> this scenario for each and every connection attempt.
> (Speaking from experience here; I've hit this scenario accidentally :-)
> So the only chance we have is a reboot.
Because if it's not in a deleted state, then it's one that is
effectively live/present and there can be only 1 remoteport with the
same WWPN/WWNN on the localport. This would be the case of a lldd
calling register for the same remoteport twice, which we don't allow.
I'm not looking to allow or survive such a driver error - I'll let the
second registration fail as it is illegal.
How did you hit this scenario ? it would really surprise me.
>> }
>>
>> +static void
>> +nvme_fc_ctrl_connectivity_loss(struct nvme_fc_ctrl *ctrl)
>> +{
>> + dev_info(ctrl->ctrl.device,
>> + "NVME-FC{%d}: controller connectivity lost. Awaiting "
>> + "Reconnect", ctrl->cnum);
>> +
>> + switch (ctrl->ctrl.state) {
>> + case NVME_CTRL_NEW:
>> + case NVME_CTRL_LIVE:
>> + /*
>> + * Schedule a controller reset. The reset will
>> + * terminate the association and schedule the
>> + * reconnect timer. Reconnects will be attempted
>> + * until either the ctlr_loss_tmo
>> + * (max_retries * connect_delay) expires or the
>> + * remoteport's dev_loss_tmo expires.
>> + */
>> + if (nvme_reset_ctrl(&ctrl->ctrl)) {
>> + dev_warn(ctrl->ctrl.device,
>> + "NVME-FC{%d}: Couldn't schedule reset. "
>> + "Deleting controller.\n",
>> + ctrl->cnum);
>> + __nvme_fc_del_ctrl(ctrl);
>> + }
>> + break;
>> +
>> + case NVME_CTRL_RECONNECTING:
>> + /*
>> + * The association has already been terminated
>> + * and the controller is attempting reconnects.
>> + * No need to do anything futher. Reconnects will
>> + * be attempted until either the ctlr_loss_tmo
>> + * (max_retries * connect_delay) expires or the
>> + * remoteport's dev_loss_tmo expires.
>> + */
>> + break;
>> +
>> + case NVME_CTRL_RESETTING:
>> + /*
>> + * Controller is already in the process of
>> + * terminating the association. No need to do
>> + * anything further. The reconnect step will
>> + * kick in naturally after the association is
>> + * terminated.
>> + */
>> + break;
>> +
>> + case NVME_CTRL_DELETING:
>> + default:
>> + /* no action to take - let it delete */
>> + break;
>> + }
>> +}
>> +
>> /**
>> * nvme_fc_unregister_remoteport - transport entry point called by an
>> * LLDD to deregister/remove a previously
>> @@ -667,15 +868,32 @@ nvme_fc_unregister_remoteport(struct nvme_fc_remote_port *portptr)
>> }
>> portptr->port_state = FC_OBJSTATE_DELETED;
>>
>> - /* tear down all associations to the remote port */
>> - list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list)
>> - __nvme_fc_del_ctrl(ctrl);
>> + list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) {
>> + /* if dev_loss_tmo==0, dev loss is immediate */
>> + if (!portptr->dev_loss_tmo) {
>> + dev_warn(ctrl->ctrl.device,
>> + "NVME-FC{%d}: controller connectivity lost. "
>> + "Deleting controller.\n",
>> + ctrl->cnum);
>> + __nvme_fc_del_ctrl(ctrl);
>> + } else
>> + nvme_fc_ctrl_connectivity_loss(ctrl);
>> + }
>>
>> spin_unlock_irqrestore(&rport->lock, flags);
>>
>> nvme_fc_abort_lsops(rport);
>>
>> + queue_delayed_work(nvme_wq, &rport->dev_loss_work,
>> + portptr->dev_loss_tmo * HZ);
>> +
>> + /*nvme_fc_ctrl_connectivity_loss
>> + * release the reference, which will allow, if all controllers
>> + * go away, which should only occur after dev_loss_tmo occurs,
>> + * for the rport to be torn down.
>> + */
>> nvme_fc_rport_put(rport);
>> +
>> return 0;
>> }
>> EXPORT_SYMBOL_GPL(nvme_fc_unregister_remoteport);
>
> Hmm.
> The dev_loss_tmo is pretty awkward; scheduling a per-rport structure
> only to tear down all attached _controllers_ once an rport goes down.
> But seeing that __nvme_fc_del_ctrl() is actually just punting the
> deletion to a workqueue I guess that's okay.
I guess so. Just reusing the existing delete code paths whcih set the
nvme controller state.
> Also we might end up having called __nvme_fc_del_ctrl() for each
> controller, making me wonder why we need to schedule a dev_loss_work if
> all controllers are being removed anyway.
I'm not sure I follow, but.. We could defer to the ctlr_loss_tmo
completely, but it may be longer than dev_loss_tmo and I think the
confuses people that may set the dev_loss_tmo value an expect things to
terminate at dev_loss_tmo. Currently, the controller delete is
min(dev_loss_tmo, ctlr_loss_tmo).
> Also I'm not exactly thrilled with the reference counting; we put things
> on a workqueue but do not increase the reference for it.
> Which means that the rport structure might be freed even though it's
> still on the workqueue. Shouldn't we rather take a reference before
> putting it on the workqueue, and drop the reference once we're done with
> the workqueue function?
ok, I agree. I'll rework so that there is no dev_loss_tmo timer
running. Instead, we'll record the rport connectivity loss time, and in
the controller reconnect loops, if we try to reconnect but connectivity
has been lost for dev_loss_tmo, we'll give up.
-- james
More information about the Linux-nvme
mailing list