Reconnect on RDMA device reset

Mon Jan 29 13:46:28 PST 2018

> On Jan 29, 2018, at 4:27 PM, Doug Ledford <dledford at redhat.com> wrote:
> 
> On Mon, 2018-01-29 at 15:11 -0500, Chuck Lever wrote:
>>> On Jan 29, 2018, at 3:01 PM, Sagi Grimberg <sagi at grimberg.me> wrote:
>>> 
>>> Hi Chuck,
>>> 
>>>> For NFS/RDMA, I think of the "failover" case where a device is
>>>> removed, then a new one is plugged in (or an existing cold
>>>> replacement is made available) with the same IP configuration.
>>>> On a "hard" NFS mount, we want the upper layers to wait for
>>>> a new suitable device to be made available, and then to use
>>>> it to resend any pending RPCs. The workload should continue
>>>> after a new device is available.
>>> 
>>> Really? so the context is held forever (in case the device never
>>> comes back)?
>> 
>> I didn't say this was the best approach :-) And it certainly can
>> change if we have something better.
> 
> Whether it's the best or not, it's the defined behavior of the "hard"
> mount option.  So if someone doesn't want that, you don't use a hard
> mount ;-)
> 
> Hard mounts are great for situations where you have a high degree of
> faith that even if they server disappears, it will reappear soon.  They
> suck when the server totally dies though, because now all the hard mount
> clients are stuck :-/.

We're working on fixing that.

--
Chuck Lever