[PATCH RFC 3/3] nvme: delay failover by command quiesce timeout
Daniel Wagner
dwagner at suse.de
Tue Apr 15 05:11:04 PDT 2025
On Tue, Apr 15, 2025 at 01:28:15AM +0300, Sagi Grimberg wrote:
> > > +void nvme_schedule_failover(struct nvme_ctrl *ctrl)
> > > +{
> > > + unsigned long delay;
> > > +
> > > + if (ctrl->cqt)
> > > + delay = msecs_to_jiffies(ctrl->cqt);
> > > + else
> > > + delay = ctrl->kato * HZ;
> > I thought that delay = m * ctrl->kato + ctrl->cqt
> > where m = ctrl->ctratt & NVME_CTRL_ATTR_TBKAS ? 3 : 2
> > no?
>
> This was said before, but if we are going to always start waiting for kato
> for failover purposes,
> we first need a patch that prevent kato from being arbitrarily long.
That should be addressed with the cross controller reset (CCR). The KATO*n
+ CQT is the upper limit for the target recovery. As soon we have CCR,
the recovery delay is reduced to the time the CCR exchange takes.
> Lets cap kato to something like 10 seconds (which is 2x the default which
> apparently no one is touching).
If I understood the TP4129 the upper limit is now defined, so we don't
have to define our own upper limit.
More information about the Linux-nvme
mailing list