[RFC] nvme-rdma: Stop queues when starting with error recovery

Daniel Wagner dwagner at suse.de
Tue May 24 06:38:03 PDT 2022


On Tue, May 24, 2022 at 04:13:28PM +0300, Sagi Grimberg wrote:
> 
> 
> On 5/24/22 16:12, Sagi Grimberg wrote:
> > 
> > 
> > On 5/23/22 18:21, Daniel Wagner wrote:
> > > When we enter error recovery we should stop all queue activities and
> > > all armed timers.
> > > 
> > > For example, we could arming an ANATT timer right before we enter
> > > error recovery but do not successfully recover before the timer
> > > fires. The timer is supposed only be active when the controller is in
> > > LIVE state hence we should call nvme_stop_ctrl when starting with the
> > > recover activites.
> > 
> > This takes me back... IIRC the main reason why the error recovery
> > handler in nvme-rdma (and nvme-tcp) does not start with nvme_stop_ctrl()
> > is because back then, nvme_stop_ctrl() could block until an
> > admin_timeout expired on an inflight admin command (I have a vague
> > memory that if cancelled/flushed the scan_work), which delayed I/O
> > failover, hence we just cherry-picked what needed to be stopped to make
> > sure I/O failover will not be blocked by anything and progress quickly.
> > 
> > If I look at nvme_stop_queue
> 
> nvme_stop_ctrl obviously...
> 
>  it does not touch the scan_work, but it
> > does flush ana_work which includes I/O, so it can still block on it
> > to fail after admin_timeout expires... also flush failfast_work is
> > taking the namespaces_rwsem (although read access). The fw_act_work
> > is probably meaningless for fabrics...
> > 
> > Maybe we need to have something like nvme_stop_ctrl_fast, that will just
> > stop the non-blocking stuff, so failover can work quickly, and then
> > after we have failover, we can do the full nvme_stop_ctrl() or the
> > reminder of nvme_stop_ctrl_fast...
> > 
> > thoughts?

I suspected that nvme_stop_ctrl() might have unwanted side
effects. Maybe factoring out the common bits from nvme_stop_ctrl() into
nvme_stop_ctrl_fast() and call it from here sounds like a reasonable
thing to do. The function name could be better, but naming is hard.



More information about the Linux-nvme mailing list