[PATCH] nvme-rdma: Remove timeout for getting RDMA-CM established event

Sagi Grimberg sagi at grimberg.me
Tue May 17 03:11:28 PDT 2022


> In case many controllers start error recovery at the same time (i.e.,
> when port is down and up), they may never succeed to reconnect again.
> This is because the target can't handle all the connect requests at
> three seconds (the arbitrary value set today). Even if some of the
> connections are established, when a single queue fails to connect,
> all the controller's queues are destroyed as well. So, on the
> following reconnection attempts the number of connect requests may
> remain the same. To fix this, remove the timeout and wait for RDMA-CM
> event to abort/complete the connect request. RDMA-CM sends unreachable
> event when a timeout of ~90 seconds is expired. This approach is used
> at other RDMA-CM users like SRP and iSER at blocking mode.

So with this connecting to an unreachable controller will take 90
seconds?



More information about the Linux-nvme mailing list