[PATCHv2] nvme-tcp: Do not terminate i/o commands during RESETTING

Sat Jan 13 18:53:29 PST 2024

Hi Hannes,

On 12/01/2024 12:09, hare at kernel.org wrote:
> From: Hannes Reinecke <hare at suse.de>
> 
> Terminating commands from the timeout handler might lead
> to a data corruption as the timeout might trigger before
> KATO expired.

Can you please explain the data corruption and how this patch is fixing it ?

> When several commands have been sent in a batch and
> the command timeouts trigger just after the keep-alive
> command has been sent then the first command will trigger
> the error recovery. But all other commands will timeout
> directly afterwards and will hit the timeout handler
> before the err_work workqueue handler has started.
> This results in these commands being aborted and
> immediately retried without waiting for KATO.
> So return BLK_EH_RESET_TIMER for I/O commands when
> the controller is in 'RESETTING' or 'DELETING'
> state to ensure that the commands will
> be retried only after the KATO interval.

I'm not sure I understand how does KATO and reconnect_delay are related ?

> 
> Signed-off-by: Hannes Reinecke <hare at suse.de>
> ---
>   drivers/nvme/host/tcp.c | 22 ++++++++++++++++------
>   1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 08805f027810..9dcb2d3b123c 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2431,17 +2431,27 @@ static enum blk_eh_timer_return nvme_tcp_timeout(struct request *rq)
>   	struct nvme_tcp_cmd_pdu *pdu = nvme_tcp_req_cmd_pdu(req);
>   	u8 opc = pdu->cmd.common.opcode, fctype = pdu->cmd.fabrics.fctype;
>   	int qid = nvme_tcp_queue_id(req->queue);
> +	enum nvme_ctrl_state state = nvme_ctrl_state(ctrl);
>   
>   	dev_warn(ctrl->device,
> -		"queue %d: timeout cid %#x type %d opcode %#x (%s)\n",
> +		"queue %d: timeout cid %#x type %d opcode %#x (%s) state %d\n",
>   		nvme_tcp_queue_id(req->queue), nvme_cid(rq), pdu->hdr.type,
> -		opc, nvme_opcode_str(qid, opc, fctype));
> +		opc, nvme_opcode_str(qid, opc, fctype), state);
> +
> +	/*
> +	 * If the controller is in state RESETTING or DELETING all
> +	 * inflight commands will be terminated soon which in turn
> +	 * may failover to a different path.
> +	 */
> +	if ((state == NVME_CTRL_RESETTING ||
> +	     state == NVME_CTRL_DELETING) && qid > 0)
> +		return BLK_EH_RESET_TIMER;
>   
> -	if (nvme_ctrl_state(ctrl) != NVME_CTRL_LIVE) {
> +	if (state != NVME_CTRL_LIVE) {
>   		/*
> -		 * If we are resetting, connecting or deleting we should
> -		 * complete immediately because we may block controller
> -		 * teardown or setup sequence
> +		 * If the controller is not live we should complete
> +		 * immediately because we may block controller teardown
> +		 * or setup sequence
>   		 * - ctrl disable/shutdown fabrics requests
>   		 * - connect requests
>   		 * - initialization admin requests