NVMeoF: multipath stuck after bringing one ethernet port down

Mon Jun 5 01:40:37 PDT 2017

On Tue, May 30, 2017 at 05:17:40PM +0300, Sagi Grimberg wrote:
> [PATCH] nvme-rdma: fast fail incoming requests while we reconnect
>
> When we encounter an transport/controller errors, error recovery
> kicks in which performs:
> 1. stops io/admin queues
> 2. moves transport queues out of LIVE state
> 3. fast fail pending io
> 4. schedule periodic reconnects.
>
> But we also need to fast fail incoming IO taht enters after we
> already scheduled. Given that our queue is not LIVE anymore, simply
> restart the request queues to fail in .queue_rq

But we shouldn't _fail_ I/O just because we're reconnecting, we
need to be able to retry it once reconnected.

> +                   cmd->fabrics.fctype != nvme_fabrics_type_connect) {
> +                       if (queue->ctrl->ctrl->state == 
> NVME_CTRL_RECONNECTING)
> +                               return -EIO;
> +                       else
> +                               return -EAGAIN;
> +               }

So this looks somewhat bogus to me, while the rest looks ok.