[PATCH v1 0/5] Fix keep-alive mechanism for fabrics
Max Gurtovoy
maxg at mellanox.com
Thu Apr 12 01:49:03 PDT 2018
On 4/11/2018 7:48 PM, James Smart wrote:
> On 4/11/2018 7:40 AM, Max Gurtovoy wrote:
>>
>>
>> On 4/11/2018 5:07 PM, Sagi Grimberg wrote:
>>>
>>>>> I don't understand what you mean by a "property of the admin queue",
>>>>> there is no such definition in the spec. In any event, the keep alive
>>>>> mechanism was defined to detect host/controller health on a periodic
>>>>> basis.
>>>>>
>>>>> The spec clearly states that the host needs to send keep alive at
>>>>> a faster rate than the Keep alive timeout accounting things such
>>>>> as transport roundtrip times, transport delays, keep alive timer
>>>>> granularity and so on.
>>>>>
>>>>> To me, this is a clear case where your use-case requires a larger
>>>>> keep alive timeout. I'm not sure why sprinkling the keep alive
>>>>> execution
>>>>> around solve/help anything. To me this looks completely redundant
>>>>> (sorry).
>>>>
>>>> I think that starting keep-alive timer at the target side after
>>>> admin connect and starting keep-alive timer at the host side after
>>>> all io-queues connect is wrong.
>>>> In my solution I start keep-alive mechanism in the host side also
>>>> after admin connect (exactly as the target side).
>
> well - true, but I believe it should be started after the controller is
> enabled - not just that the admin queue has been created. I don't think
> even the admin queue can be serviced till the adapter is enabled. Thus,
> I would agree with moving nvme_start_keep_alive() from nvme_start_ctrl()
> to nvme_enable_ctrl(). We would also want the "enablement" to detect
> the keepalive style (today's keepalive or TBKA) supported by the time
> that call is made. Given that needs to be known, I believe there will
> always be a small window where a couple of things have to be accessed
> before the KA starts.
At least I see that you agree to start keep-alive timer before creating
the IO queues so that's a progress :).
The fact that I didn't put it in nvme_enable_ctrl is because the FC code
never calls nvme_disable_ctrl (and there we probably should stop the timer).
I'm trying to solve an issue without changing the whole design of the FC
implementation. We can do it incrementaly.
>
> I strongly support Sagi's statements on trying to keep the KA start/stop
> in common code rather than sprinkling it around. I agree with the
> current placement of the stop code in nvme_stop_ctrl(). The stop ctrl
> should always be called prior to the admin queue being torn down in the
> transport (as part of kicking off the reset of the controller).
If we'll add KA start to nvme_enable_ctrl then the KA stop should be
added to nvme_disable_ctrl. We should be symmetric.
>
> Note: the FC patch is way off.. Also note that I've killed the
> xxx_is_ready() routines in the updated if_ready patch I sent. So your
> mods wouldn't apply (soon).
I'm rebased over latest nvme branch and your xxx_is_ready is not there.
If you have a better solution - please share with us.
>
>>>
>>> That does not guarantee anything either. as you were simply able to
>>> minimize the effect.
>>>
>>> An alternative patch would also be able to minimize the effect:
>>> --
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index e1b708ee6783..a84198b1e0d8 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -847,7 +847,7 @@ static void nvme_start_keep_alive(struct
>>> nvme_ctrl *ctrl)
>>> INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
>>> memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
>>> ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
>>> - schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
>>> + schedule_delayed_work(&ctrl->ka_work, 0);
>>> }
>>>
>>> void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
>>> --
>>>
>>> This way the first keep alive will execute immediately the first time
>>> and not kato after...
>>
>> This is a good patch, and I can add it to the series.
>
> Uh... I wouldn't want to do this. You can easily burn lots of cycles
> sending a KA.
I don't realy mind to lose some cycles during connection establishment
and do the right thing.
>
> I would lean toward making people get used to setting KATO to a large
> enough value that it can be used equally well with TBKA - thus it should
> be something like this instead:
> schedule_delayed_work(&ctrl->ka_work, (ctrl->kato * HZ) / 2);
People are using the default value. I guess I can live with (ctrl->kato
* HZ) / 2 as a first delay if we start the timer before IO queues creation.
>
> This is somewhat independent from the grace value - as grace is a
> scaling factor to accommodate congestion and delayed processing.
>
> -- james
>
I can summerize that there are different perspectives on that issue and
that is the reason why I think this should be added to the spec to clear
things up.
The spec should say explicitly that the KA timer should be started ater
admin_queue connect and before starting to perform io_connect.
Christoph,
what do you think about the assymetry we have here and the patches that
fix it ?
-Max.
More information about the Linux-nvme
mailing list