[PATCH v2] nvme: rdma/tcp: call nvme_mpath_stop() from reconnect workqueue

Martin Wilck mwilck at suse.com
Tue Apr 27 08:30:22 BST 2021


On Tue, 2021-04-27 at 09:45 +0800, Chao Leng wrote:
> 
> 
> On 2021/4/27 0:27, Martin Wilck wrote:
> > On Mon, 2021-04-26 at 16:51 +0200, Christoph Hellwig wrote:
> > > On Fri, Apr 23, 2021 at 05:21:03PM -0700, Sagi Grimberg wrote:
> > > > 
> > > > > diff --git a/drivers/nvme/host/rdma.c
> > > > > b/drivers/nvme/host/rdma.c
> > > > > index be905d4fdb47..fc07a7b0dc1d 100644
> > > > > --- a/drivers/nvme/host/rdma.c
> > > > > +++ b/drivers/nvme/host/rdma.c
> > > > > @@ -1202,6 +1202,7 @@ static void
> > > > > nvme_rdma_error_recovery_work(struct work_struct *work)
> > > > >                  return;
> > > > >          }
> > > > >    +     nvme_mpath_stop(&ctrl->ctrl);
> > > > >          nvme_rdma_reconnect_or_remove(ctrl);
> > > > 
> > > > Its pretty annoying to have to needlessly wait for the ana log
> > > > page
> > > > request to timeout... But this is also needed because we init
> > > > ana_lock
> > > > in nvme_mpath_init while it can be potentially taken in
> > > > ana_work,
> > > > which
> > > > is a precipice for bad things...
> > > 
> > > I also really hate open coding this mpath detail in the transport
> > > drivers.  Didn't you at some point have a series to factor out
> > > more
> > > common code from the whole reset and reconnect path?
> > > 
> > 
> > So ... resubmit and cc stable as Sagi suggested, or work out
> > something
> > else?
> Another option:call nvme_mpath_stop in nvme_mpath_init.
> In this way, transport drivers do not care mpath detail.

That would require that we determine in nvme_mpath_init() whether we're
called for the first time for the given controller, or whether the
controller is re-initializing. It's not obvious to me how to do that
reliably without introducing a new state flag.

Regards,
Martin





More information about the Linux-nvme mailing list