[PATCH RFC 3/3] nvme: delay failover by command quiesce timeout

Tue Apr 15 23:57:19 PDT 2025

On Tue, Apr 15, 2025 at 05:17:38PM -0700, Mohamed Khalfella wrote:
> Help me see this:
> 
> - nvme_failover_req() is the only place reqs are added to failover_list.
> - nvme_decide_disposition() returns FAILOVER only if req has REQ_NVME_MPATH set.
> 
> How/where do admin requests get REQ_NVME_MPATH set?

Admin commands don't set REQ_NVME_MPATH. This is what the current code
does and I have deliberately decided not to touch this with this RFC.

Given how much discussion the CQT/CCR feature triggers, I don't think
it's a good idea to add this topic to this discussion.

> > > - What about requests that do not go through nvme_failover_req(), like
> > >   passthrough requests, do we not want to hold these requests until it
> > >   is safe for them to be retried?
> > 
> > Pasthrough commands should fail immediately. Userland is in charge here,
> > not the kernel. At least this what should happen here.
> > 
> > > - In case of controller reset or delete if nvme_disable_ctrl()
> > >   successfully disables the controller, then we do not want to add
> > >   canceled requests to failover_list, right? Does this implementation
> > >   consider this case?
> > 
> > Not sure. I've tested a few things but I am pretty sure this RFC is far
> > from being complete.
> 
> I think it does not, and maybe it should honor this. Otherwise every
> controller reset/delete will end up holding requests unnecessarily.

Yes, this is one of the problems with the failover queue. It could be
solved by really starting to track the delay timeout for each commands.
But this is a lot of logic code and complexity. Thus during the
discussion at LSFMM everyone including me, said failover queue idea
should not be our first choice.