[PATCH RFC 3/3] nvme: delay failover by command quiesce timeout
Sagi Grimberg
sagi at grimberg.me
Mon Apr 14 15:28:15 PDT 2025
On 10/04/2025 11:51, Mohamed Khalfella wrote:
> On 2025-03-24 13:07:58 +0100, Daniel Wagner wrote:
>> The TP4129 mendates that the failover should be delayed by CQT. Thus when
>> nvme_decide_disposition returns FAILOVER do not immediately re-queue it on
>> the namespace level instead queue it on the ctrl's request_list and
>> moved later to the namespace's requeue_list.
>>
>> Signed-off-by: Daniel Wagner <wagi at kernel.org>
>> ---
>> drivers/nvme/host/core.c | 19 ++++++++++++++++
>> drivers/nvme/host/fc.c | 4 ++++
>> drivers/nvme/host/multipath.c | 52 ++++++++++++++++++++++++++++++++++++++++---
>> drivers/nvme/host/nvme.h | 15 +++++++++++++
>> drivers/nvme/host/rdma.c | 2 ++
>> drivers/nvme/host/tcp.c | 1 +
>> 6 files changed, 90 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 135045528ea1c79eac0d6d47d5f7f05a7c98acc4..f3155c7735e75e06c4359c26db8931142c067e1d 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -239,6 +239,7 @@ static void nvme_do_delete_ctrl(struct nvme_ctrl *ctrl)
>>
>> flush_work(&ctrl->reset_work);
>> nvme_stop_ctrl(ctrl);
>> + nvme_flush_failover(ctrl);
>> nvme_remove_namespaces(ctrl);
>> ctrl->ops->delete_ctrl(ctrl);
>> nvme_uninit_ctrl(ctrl);
>> @@ -1310,6 +1311,19 @@ static void nvme_queue_keep_alive_work(struct nvme_ctrl *ctrl)
>> queue_delayed_work(nvme_wq, &ctrl->ka_work, delay);
>> }
>>
>> +void nvme_schedule_failover(struct nvme_ctrl *ctrl)
>> +{
>> + unsigned long delay;
>> +
>> + if (ctrl->cqt)
>> + delay = msecs_to_jiffies(ctrl->cqt);
>> + else
>> + delay = ctrl->kato * HZ;
> I thought that delay = m * ctrl->kato + ctrl->cqt
> where m = ctrl->ctratt & NVME_CTRL_ATTR_TBKAS ? 3 : 2
> no?
This was said before, but if we are going to always start waiting for
kato for failover purposes,
we first need a patch that prevent kato from being arbitrarily long.
Lets cap kato to something like 10 seconds (which is 2x the default
which apparently no one is touching).
More information about the Linux-nvme
mailing list