[PATCH 3/3] nvme: start keep-alive after admin queue setup
Hannes Reinecke
hare at suse.de
Mon Nov 20 06:19:11 PST 2023
On 11/20/23 14:39, Sagi Grimberg wrote:
>
>> Setting up I/O queues might take quite some time on larger and/or
>> busy setups, so KATO might expire before all I/O queues could be
>> set up.
>> Fix this by start keep alive from the ->init_ctrl_finish() callback,
>> and stopping it when calling nvme_cancel_admin_tagset().
>
> If this is a fix, the title should describe the issue it is fixing, and
> the body should say how it is fixing it.
>
>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>> ---
>> drivers/nvme/host/core.c | 6 +++---
>> drivers/nvme/host/fc.c | 6 ++++++
>> 2 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 62612f87aafa..f48b4f735d2d 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>> void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
>> {
>> + nvme_stop_keep_alive(ctrl);
>> if (ctrl->admin_tagset) {
>> blk_mq_tagset_busy_iter(ctrl->admin_tagset,
>> nvme_cancel_request, ctrl);
>
> There is a cross dependency here, now nvme_cancel_admin_tagset needs to
> have the keep-alive stopped first, which may be waiting on I/O, which
> needs to be cancelled...
>
> Keep in mind that kato can be arbitrarily long, and now this function
> may be blocked on this kato period.
>
> I also think that now the function is doing something that is more
> than simply cancelling the inflight admin tagset, as it is named.
>
Hmm. I could move it out of cancel_admin_tagset(). It means that I'll
have to touch each transport driver, but as I have to touch at least
fc anyway I guess it's okay.
>> @@ -3200,6 +3201,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl
>> *ctrl, bool was_suspended)
>> clear_bit(NVME_CTRL_DIRTY_CAPABILITY, &ctrl->flags);
>> ctrl->identified = true;
>> + nvme_start_keep_alive(ctrl);
>> +
>
> I'm fine with moving it here. But instead, maybe just change
> nvme_start_keep_alive() to use a zero delay and keep it where it
> is? will that help?
>
Not really. We still will fail if setting up I/O queues takes longer
than the KATO period.
Will be updating the patch.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions Germany GmbH, Frankenstr. 146, 90461 Nürnberg
Managing Directors: I. Totev, A. Myers, A. McDonald, M. B. Moerman
(HRB 36809, AG Nürnberg)
More information about the Linux-nvme
mailing list