[PATCH 3/3] nvme: start keep-alive after admin queue setup

Sagi Grimberg sagi at grimberg.me
Mon Nov 20 11:05:44 PST 2023



On 11/20/23 18:01, Hannes Reinecke wrote:
> On 11/20/23 14:39, Sagi Grimberg wrote:
>>
>>> Setting up I/O queues might take quite some time on larger and/or
>>> busy setups, so KATO might expire before all I/O queues could be
>>> set up.
>>> Fix this by start keep alive from the ->init_ctrl_finish() callback,
>>> and stopping it when calling nvme_cancel_admin_tagset().
>>
>> If this is a fix, the title should describe the issue it is fixing, and
>> the body should say how it is fixing it.
>>
>>> Signed-off-by: Hannes Reinecke <hare at suse.de>
>>> ---
>>>   drivers/nvme/host/core.c | 6 +++---
>>>   drivers/nvme/host/fc.c   | 6 ++++++
>>>   2 files changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 62612f87aafa..f48b4f735d2d 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -483,6 +483,7 @@ EXPORT_SYMBOL_GPL(nvme_cancel_tagset);
>>>   void nvme_cancel_admin_tagset(struct nvme_ctrl *ctrl)
>>>   {
>>> +    nvme_stop_keep_alive(ctrl);
>>>       if (ctrl->admin_tagset) {
>>>           blk_mq_tagset_busy_iter(ctrl->admin_tagset,
>>>                   nvme_cancel_request, ctrl);
>>
>> There is a cross dependency here, now nvme_cancel_admin_tagset needs to
>> have the keep-alive stopped first, which may be waiting on I/O, which
>> needs to be cancelled...
>>
>> Keep in mind that kato can be arbitrarily long, and now this function
>> may be blocked on this kato period.
>>
>> I also think that now the function is doing something that is more
>> than simply cancelling the inflight admin tagset, as it is named.
>>
> I am having a hard time following this reasoning.
> While I do accept that nvme_stop_keep_alive() might trigger I/O
> (ie if the work queue has just been started when calling 
> cancel_delayed_work), nvme_tcp_error_recovery_work() has this:
> 
>      nvme_stop_keep_alive(ctrl);
>      flush_work(&ctrl->async_event_work);
>      nvme_tcp_teardown_io_queues(ctrl, false);
>      /* unquiesce to fail fast pending requests */
>      nvme_unquiesce_io_queues(ctrl);
>      nvme_tcp_teardown_admin_queue(ctrl, false);
> 
> and nvme_tcp_teardown_admin_queue() calls nvme_cancel_admin_tagset().
> So by your above reasoning this code should be wrong, too.
> What am I missing?

Need to dig through the history, but it can most definitely move to
after the teardown. It could be from the earlier days where the
transport fencing was not as reliable particularly for admin requests.



More information about the Linux-nvme mailing list