[PATCHv2] nvme-tcp: Do not terminate i/o commands during RESETTING
Sagi Grimberg
sagi at grimberg.me
Mon Jan 15 04:36:33 PST 2024
On 1/12/24 12:09, hare at kernel.org wrote:
> From: Hannes Reinecke <hare at suse.de>
>
> Terminating commands from the timeout handler might lead
> to a data corruption as the timeout might trigger before
> KATO expired.
> When several commands have been sent in a batch and
> the command timeouts trigger just after the keep-alive
> command has been sent then the first command will trigger
> the error recovery. But all other commands will timeout
> directly afterwards and will hit the timeout handler
> before the err_work workqueue handler has started.
> This results in these commands being aborted and
> immediately retried without waiting for KATO.
> So return BLK_EH_RESET_TIMER for I/O commands when
> the controller is in 'RESETTING' or 'DELETING'
> state to ensure that the commands will
> be retried only after the KATO interval.
>
> Signed-off-by: Hannes Reinecke <hare at suse.de>
> ---
> drivers/nvme/host/tcp.c | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 08805f027810..9dcb2d3b123c 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2431,17 +2431,27 @@ static enum blk_eh_timer_return nvme_tcp_timeout(struct request *rq)
> struct nvme_tcp_cmd_pdu *pdu = nvme_tcp_req_cmd_pdu(req);
> u8 opc = pdu->cmd.common.opcode, fctype = pdu->cmd.fabrics.fctype;
> int qid = nvme_tcp_queue_id(req->queue);
> + enum nvme_ctrl_state state = nvme_ctrl_state(ctrl);
>
> dev_warn(ctrl->device,
> - "queue %d: timeout cid %#x type %d opcode %#x (%s)\n",
> + "queue %d: timeout cid %#x type %d opcode %#x (%s) state %d\n",
> nvme_tcp_queue_id(req->queue), nvme_cid(rq), pdu->hdr.type,
> - opc, nvme_opcode_str(qid, opc, fctype));
> + opc, nvme_opcode_str(qid, opc, fctype), state);
> +
> + /*
> + * If the controller is in state RESETTING or DELETING all
> + * inflight commands will be terminated soon which in turn
> + * may failover to a different path.
> + */
> + if ((state == NVME_CTRL_RESETTING ||
> + state == NVME_CTRL_DELETING) && qid > 0)
> + return BLK_EH_RESET_TIMER;
>
> - if (nvme_ctrl_state(ctrl) != NVME_CTRL_LIVE) {
> + if (state != NVME_CTRL_LIVE) {
The remaining states are:
NVME_CTRL_NEW
NVME_CTRL_CONNECTING
NVME_CTRL_DELETING_NOIO
NVME_CTRL_DEAD
The comment perhaps is becoming outdated?
I'm wandering if it would make sense to turn this
into a switch-case statement:
--
switch (state) {
case NVME_CTRL_LIVE:
/*
* LIVE state should trigger the normal error recovery
* which will handle completing this request.
*/
break;
case NVME_CTRL_RESETTING:
/* fallthru */
case NVME_CTRL_DELETING:
if (!qid)
break;
/*
* If the controller is in state RESETTING or DELETING
* all inflight I/O commands will be terminated soon
* which in turn may failover to a different path.
*/
return BLK_EH_RESET_TIMER;
case NVME_CTRL_DELETING_NOIO;
case NVME_CTRL_DEAD;
/* fallthru */
case NVME_CTRL_CONNECTING;
/*
* Fail immediately to avoid a deadlock if the request
* is blocking as part of the ctrl setup or teardown
* sequence (e.g. connect, property_set).
*/
nvme_tcp_complete_timed_out(rq);
return BLK_EH_DONE;
case NVME_CTRL_NEW:
/* fallthru */
default:
WARN_ONCE();
return BLK_EH_RESET_TIMER;
}
nvme_tcp_error_recovery(ctrl);
return BLK_EH_RESET_TIMER;
--
Thoughts?
More information about the Linux-nvme
mailing list