[PATCH v2 3/3] nvme: improve handling of long keep alives

Sagi Grimberg sagi at grimberg.me
Thu May 18 02:01:33 PDT 2023


> Upon keep alive completion, nvme_keep_alive_work is scheduled with the
> same delay every time. If keep alive commands are completing slowly,
> this may cause a keep alive timeout. The following trace illustrates the
> issue, taking KATO = 8 and TBKAS off for simplicity:
> 
> 1. t = 0: run nvme_keep_alive_work, send keep alive
> 2. t = ε: keep alive reaches controller, controller restarts its keep
>            alive timer
> 3. t = 4: host receives keep alive completion, schedules
>            nvme_keep_alive_work with delay 4
> 4. t = 8: run nvme_keep_alive_work, send keep alive
> 
> Here, a keep alive having RTT of 4 causes a delay of at least 8 - ε
> between the controller receiving successive keep alives. With ε small,
> the controller is likely to detect a keep alive timeout.
> 
> Fix this by calculating the RTT of the keep alive command, and adjusting
> the scheduling delay of the next keep alive work accordingly.
> 
> Reported-by: Costa Sapuntzakis <costa at purestorage.com>
> Reported-by: Randy Jennings <randyj at purestorage.com>
> Signed-off-by: Uday Shankar <ushankar at purestorage.com>
> Reviewed-by: Hannes Reinecke <hare at suse.de>
> ---
>   drivers/nvme/host/core.c | 20 +++++++++++++++++---
>   1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 0215dcf643ef..2028fa050bcf 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1162,10 +1162,15 @@ EXPORT_SYMBOL_NS_GPL(nvme_passthru_end, NVME_TARGET_PASSTHRU);
>    * frequency, as one command completion can postpone sending a keep alive
>    * command by up to twice the delay between runs.
>    */
> +static unsigned long nvme_keep_alive_work_period(struct nvme_ctrl *ctrl)
> +{
> +	return (ctrl->ctratt & NVME_CTRL_ATTR_TBKAS) ?
> +		(ctrl->kato * HZ / 4) : (ctrl->kato * HZ / 2);
> +}
> +
>   static void nvme_queue_keep_alive_work(struct nvme_ctrl *ctrl)
>   {
> -	unsigned long delay = (ctrl->ctratt & NVME_CTRL_ATTR_TBKAS) ?
> -		ctrl->kato * HZ / 4 : ctrl->kato * HZ / 2;
> +	unsigned long delay = nvme_keep_alive_work_period(ctrl);
>   	queue_delayed_work(nvme_wq, &ctrl->ka_work, delay);
>   }
>   
> @@ -1175,6 +1180,15 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
>   	struct nvme_ctrl *ctrl = rq->end_io_data;
>   	unsigned long flags;
>   	bool startka = false;
> +	unsigned long rtt = jiffies - nvme_req(rq)->start_time;
> +	unsigned long delay = nvme_keep_alive_work_period(ctrl);
> +
> +	/* Subtract off the keepalive RTT so nvme_keep_alive_work runs
> +	 * at the desired frequency. */
> +	if (rtt <= delay)
> +		delay -= rtt;
> +	else
> +		delay = 0;
This should probably log a warning or something... The controller has
such a slow response time to keep-alive that it exceeds the delay...



More information about the Linux-nvme mailing list