[PATCH 2/3] nvme: sanitize KATO setting

Mon Mar 8 13:54:33 GMT 2021

On 3/8/21 2:11 PM, Max Gurtovoy wrote:
> 
> On 3/2/2021 11:26 AM, Hannes Reinecke wrote:
>> According to the NVMe base spec the KATO commands should be sent
>> at half of the KATO interval, to properly account for round-trip
>> times.
>> As we now will only ever send one KATO command per connection we
>> can easily use the recommended values.
>> This also fixes a potential issue where the request timeout for
>> the KATO command does not match the value in the connect command,
>> which might be causing spurious connection drops from the target.
>>
>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>> ---
>>   drivers/nvme/host/core.c    | 9 ++++++---
>>   drivers/nvme/host/fabrics.c | 2 +-
>>   drivers/nvme/host/nvme.h    | 9 ++++++++-
>>   3 files changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index f890b310499e..6d096d41a82f 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1223,7 +1223,8 @@ static void nvme_keep_alive_end_io(struct
>> request *rq, blk_status_t status)
>>           startka = true;
>>       spin_unlock_irqrestore(&ctrl->lock, flags);
>>       if (startka)
>> -        queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ);
>> +        queue_delayed_work(nvme_wq, &ctrl->ka_work,
>> +                   NVME_KATO_DELAY(ctrl->kato) * HZ);
>>   }
>>     static int nvme_keep_alive(struct nvme_ctrl *ctrl)
>> @@ -1258,7 +1259,8 @@ static void nvme_keep_alive_work(struct
>> work_struct *work)
>>           dev_dbg(ctrl->device,
>>               "reschedule traffic based keep-alive timer\n");
>>           ctrl->comp_seen = false;
>> -        queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ);
>> +        queue_delayed_work(nvme_wq, &ctrl->ka_work,
>> +                   NVME_KATO_DELAY(ctrl->kato) * HZ);
>>           return;
>>       }
>>   @@ -1275,7 +1277,8 @@ static void nvme_start_keep_alive(struct
>> nvme_ctrl *ctrl)
>>       if (unlikely(ctrl->kato == 0))
>>           return;
>>   -    queue_delayed_work(nvme_wq, &ctrl->ka_work, ctrl->kato * HZ);
>> +    queue_delayed_work(nvme_wq, &ctrl->ka_work,
>> +               NVME_KATO_DELAY(ctrl->kato) * HZ);
>>   }
>>     void nvme_stop_keep_alive(struct nvme_ctrl *ctrl)
>> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
>> index 5dfd806fc2d2..dba32e39afbf 100644
>> --- a/drivers/nvme/host/fabrics.c
>> +++ b/drivers/nvme/host/fabrics.c
>> @@ -382,7 +382,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
>>        * and add a grace period for controller kato enforcement
>>        */
>>       cmd.connect.kato = ctrl->kato ?
>> -        cpu_to_le32((ctrl->kato + NVME_KATO_GRACE) * 1000) : 0;
>> +        cpu_to_le32(ctrl->kato * 1000) : 0;
>>         if (ctrl->opts->disable_sqflow)
>>           cmd.connect.cattr |= NVME_CONNECT_DISABLE_SQFLOW;
>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>> index 23711f6b7d13..912830389997 100644
>> --- a/drivers/nvme/host/nvme.h
>> +++ b/drivers/nvme/host/nvme.h
>> @@ -27,7 +27,14 @@ extern unsigned int admin_timeout;
>>   #define NVME_ADMIN_TIMEOUT    (admin_timeout * HZ)
>>     #define NVME_DEFAULT_KATO    5
>> -#define NVME_KATO_GRACE        10
>> +
>> +/*
>> + * The recommended frequency for KATO commands
>> + * according to NVMe 1.4 section 7.12.1:
>> + * The host should send Keep Alive commands at half of the
>> + * Keep Alive Timeout accounting for transport roundtrip times [..].
>> + */
>> +#define NVME_KATO_DELAY(k)    ((k) >> 1)
> 
> what will happen in case k == 1 ?
> 
> 
Ho-hum. Good point.
Will fix it up.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare at suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer