[PATCH v3 18/21] nvme: Update CCR completion wait timeout to consider CQT
Hannes Reinecke
hare at suse.de
Mon Feb 16 23:09:33 PST 2026
On 2/16/26 19:45, Mohamed Khalfella wrote:
> On Mon 2026-02-16 13:54:18 +0100, Hannes Reinecke wrote:
>> On 2/14/26 05:25, Mohamed Khalfella wrote:
>>> TP8028 Rapid Path Failure Recovery does not define how much time the
>>> host should wait for CCR operation to complete. It is reasonable to
>>> assume that CCR operation can take up to ctrl->cqt. Update wait time for
>>> CCR operation to be max(ctrl->cqt, ctrl->kato).
>>>
>>> Signed-off-by: Mohamed Khalfella <mkhalfella at purestorage.com>
>>> ---
>>> drivers/nvme/host/core.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 0680d05900c1..ff479c0263ab 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -631,7 +631,7 @@ static int nvme_issue_wait_ccr(struct nvme_ctrl *sctrl, struct nvme_ctrl *ictrl)
>>> if (result & 0x01) /* Immediate Reset Successful */
>>> goto out;
>>>
>>> - tmo = secs_to_jiffies(ictrl->kato);
>>> + tmo = msecs_to_jiffies(max(ictrl->cqt, ictrl->kato * 1000));
>>> if (!wait_for_completion_timeout(&ccr.complete, tmo)) {
>>> ret = -ETIMEDOUT;
>>> goto out;
>>
>> That is not my understanding. I was under the impression that CQT is the
>> _additional_ time a controller requires to clear out outstanding
>> commands once it detected a loss of communication (ie _after_ KATO).
>> Which would mean we have to wait for up to
>> (ctrl->kato * 1000) + ctrl->cqt.
>
> At this point the source controller knows about communication loss. We
> do not need kato wait. In theory we should just wait for CQT.
> max(cqt, kato) is a conservative guess I made.
>
Not quite. The source controller (on the host!) knows about the
communication loss. But the target might not, as the keep-alive
command might have arrived at the target _just_ before KATO
triggered on the host. So the target is still good, and will
be waiting for _another_ KATO interval before declaring
a loss of communication.
And only then will the CQT period start at the target.
Randy, please correct me if I'm wrong ...
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list