[PATCH v1 0/5] Fix keep-alive mechanism for fabrics

Wed Apr 11 09:48:48 PDT 2018

On 4/11/2018 7:40 AM, Max Gurtovoy wrote:
>
>
> On 4/11/2018 5:07 PM, Sagi Grimberg wrote:
>>
>>>> I don't understand what you mean by a "property of the admin queue",
>>>> there is no such definition in the spec. In any event, the keep alive
>>>> mechanism was defined to detect host/controller health on a periodic
>>>> basis.
>>>>
>>>> The spec clearly states that the host needs to send keep alive at
>>>> a faster rate than the Keep alive timeout accounting things such
>>>> as transport roundtrip times, transport delays, keep alive timer
>>>> granularity and so on.
>>>>
>>>> To me, this is a clear case where your use-case requires a larger
>>>> keep alive timeout. I'm not sure why sprinkling the keep alive 
>>>> execution
>>>> around solve/help anything. To me this looks completely redundant
>>>> (sorry).
>>>
>>> I think that starting keep-alive timer at the target side after 
>>> admin connect and starting keep-alive timer at the host side after 
>>> all io-queues connect is wrong.
>>> In my solution I start keep-alive mechanism in the host side also 
>>> after admin connect (exactly as the target side).

well - true, but I believe it should be started after the controller is 
enabled - not just that the admin queue has been created.  I don't think 
even the admin queue can be serviced till the adapter is enabled.   
Thus, I would agree with moving nvme_start_keep_alive() from 
nvme_start_ctrl() to nvme_enable_ctrl().   We would also want the 
"enablement" to detect the keepalive style (today's keepalive or TBKA) 
supported by the time that call is made.   Given that needs to be known, 
I believe there will always be a small window where a couple of things 
have to be accessed before the KA starts.

I strongly support Sagi's statements on trying to keep the KA start/stop 
in common code rather than sprinkling it around.   I agree with the 
current placement of the stop code in nvme_stop_ctrl().   The stop ctrl 
should always be called prior to the admin queue being torn down in the 
transport (as part of kicking off the reset of the controller).

Note: the FC patch is way off..  Also note that I've killed the 
xxx_is_ready() routines in the updated if_ready patch I sent. So your 
mods wouldn't apply (soon).

>>
>> That does not guarantee anything either. as you were simply able to
>> minimize the effect.
>>
>> An alternative patch would also be able to minimize the effect:
>> -- 
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index e1b708ee6783..a84198b1e0d8 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -847,7 +847,7 @@ static void nvme_start_keep_alive(struct 
>> nvme_ctrl *ctrl)
>>          INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
>>          memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
>>          ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
>> -       schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
>> +       schedule_delayed_work(&ctrl->ka_work, 0);
>>   }
>>
>>   void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
>> -- 
>>
>> This way the first keep alive will execute immediately the first time
>> and not kato after...
>
> This is a good patch, and I can add it to the series.

Uh... I wouldn't want to do this.  You can easily burn lots of cycles 
sending a KA.

I would lean toward making people get used to setting KATO to a large 
enough value that it can be used equally well with TBKA - thus it should 
be something like this instead:
        schedule_delayed_work(&ctrl->ka_work, (ctrl->kato * HZ) / 2);

This is somewhat independent from the grace value - as grace is a 
scaling factor to accommodate congestion and delayed processing.

-- james