[PATCH 3/3] nvme: start keep-alive after admin queue setup
Sagi Grimberg
sagi at grimberg.me
Mon Nov 20 05:39:16 PST 2023
> Setting up I/O queues might take quite some time on larger and/or
> busy setups, so KATO might expire before all I/O queues could be
> set up.
> Fix this by start keep alive from the ->init_ctrl_finish() callback,
> and stopping it when calling nvme_cancel_admin_tagset().
If this is a fix, the title should describe the issue it is fixing, and
the body should say how it is fixing it.
> Signed-off-by: Hannes Reinecke <hare at suse.de>
> ---
> drivers/nvme/host/core.c | 6 +++---
> drivers/nvme/host/fc.c | 6 ++++++
> 2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 62612f87aafa..f48b4f735d2d 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>
> void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
> {
> + nvme_stop_keep_alive(ctrl);
> if (ctrl->admin_tagset) {
> blk_mq_tagset_busy_iter(ctrl->admin_tagset,
> nvme_cancel_request, ctrl);
There is a cross dependency here, now nvme_cancel_admin_tagset needs to
have the keep-alive stopped first, which may be waiting on I/O, which
needs to be cancelled...
Keep in mind that kato can be arbitrarily long, and now this function
may be blocked on this kato period.
I also think that now the function is doing something that is more
than simply cancelling the inflight admin tagset, as it is named.
> @@ -3200,6 +3201,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl, bool was_suspended)
> clear_bit(NVME_CTRL_DIRTY_CAPABILITY, &ctrl->flags);
> ctrl->identified = true;
>
> + nvme_start_keep_alive(ctrl);
> +
I'm fine with moving it here. But instead, maybe just change
nvme_start_keep_alive() to use a zero delay and keep it where it
is? will that help?
> return 0;
> }
> EXPORT_SYMBOL_GPL(nvme_init_ctrl_finish);
> @@ -4333,7 +4336,6 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
> {
> nvme_mpath_stop(ctrl);
> nvme_auth_stop(ctrl);
> - nvme_stop_keep_alive(ctrl);
> nvme_stop_failfast_work(ctrl);
> flush_work(&ctrl->async_event_work);
> cancel_work_sync(&ctrl->fw_act_work);
> @@ -4344,8 +4346,6 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl);
>
> void nvme_start_ctrl(struct nvme_ctrl *ctrl)
> {
> - nvme_start_keep_alive(ctrl);
> -
> nvme_enable_aen(ctrl);
>
> /*
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index a15b37750d6e..a9affc8b755b 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2530,6 +2530,12 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
> * clean up the admin queue. Same thing as above.
> */
> nvme_quiesce_admin_queue(&ctrl->ctrl);
> +
> + /*
> + * Open-coding nvme_cancel_admin_tagset() as fc
> + * is not using nvme_cancel_request().
> + */
> + nvme_stop_keep_alive(ctrl);
> blk_sync_queue(ctrl->ctrl.admin_q);
> blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
> nvme_fc_terminate_exchange, &ctrl->ctrl);
What does this fix? This should really be split out of the patch.
More information about the Linux-nvme
mailing list