[PATCH 3/3] nvme: start keep-alive after admin queue setup

Hannes Reinecke hare at suse.de
Mon Nov 20 06:19:11 PST 2023


On 11/20/23 14:39, Sagi Grimberg wrote:
> 
>> Setting up I/O queues might take quite some time on larger and/or
>> busy setups, so KATO might expire before all I/O queues could be
>> set up.
>> Fix this by start keep alive from the ->init_ctrl_finish() callback,
>> and stopping it when calling nvme_cancel_admin_tagset().
> 
> If this is a fix, the title should describe the issue it is fixing, and
> the body should say how it is fixing it.
> 
>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>> ---
>>   drivers/nvme/host/core.c | 6 +++---
>>   drivers/nvme/host/fc.c   | 6 ++++++
>>   2 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 62612f87aafa..f48b4f735d2d 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>>   void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
>>   {
>> +    nvme_stop_keep_alive(ctrl);
>>       if (ctrl->admin_tagset) {
>>           blk_mq_tagset_busy_iter(ctrl->admin_tagset,
>>                   nvme_cancel_request, ctrl);
> 
> There is a cross dependency here, now nvme_cancel_admin_tagset needs to
> have the keep-alive stopped first, which may be waiting on I/O, which
> needs to be cancelled...
> 
> Keep in mind that kato can be arbitrarily long, and now this function
> may be blocked on this kato period.
> 
> I also think that now the function is doing something that is more
> than simply cancelling the inflight admin tagset, as it is named.
> 
Hmm. I could move it out of cancel_admin_tagset(). It means that I'll
have to touch each transport driver, but as I have to touch at least
fc anyway I guess it's okay.

>> @@ -3200,6 +3201,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl 
>> *ctrl, bool was_suspended)
>>       clear_bit(NVME_CTRL_DIRTY_CAPABILITY, &ctrl->flags);
>>       ctrl->identified = true;
>> +    nvme_start_keep_alive(ctrl);
>> +
> 
> I'm fine with moving it here. But instead, maybe just change
> nvme_start_keep_alive() to use a zero delay and keep it where it
> is? will that help?
> 
Not really. We still will fail if setting up I/O queues takes longer
than the KATO period.

Will be updating the patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare at suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Frankenstr. 146, 90461 Nürnberg
Managing Directors: I. Totev, A. Myers, A. McDonald, M. B. Moerman
(HRB 36809, AG Nürnberg)




More information about the Linux-nvme mailing list