Reconnect on RDMA device reset
Chuck Lever
chuck.lever at oracle.com
Mon Jan 29 13:46:28 PST 2018
> On Jan 29, 2018, at 4:27 PM, Doug Ledford <dledford at redhat.com> wrote:
>
> On Mon, 2018-01-29 at 15:11 -0500, Chuck Lever wrote:
>>> On Jan 29, 2018, at 3:01 PM, Sagi Grimberg <sagi at grimberg.me> wrote:
>>>
>>> Hi Chuck,
>>>
>>>> For NFS/RDMA, I think of the "failover" case where a device is
>>>> removed, then a new one is plugged in (or an existing cold
>>>> replacement is made available) with the same IP configuration.
>>>> On a "hard" NFS mount, we want the upper layers to wait for
>>>> a new suitable device to be made available, and then to use
>>>> it to resend any pending RPCs. The workload should continue
>>>> after a new device is available.
>>>
>>> Really? so the context is held forever (in case the device never
>>> comes back)?
>>
>> I didn't say this was the best approach :-) And it certainly can
>> change if we have something better.
>
> Whether it's the best or not, it's the defined behavior of the "hard"
> mount option. So if someone doesn't want that, you don't use a hard
> mount ;-)
>
> Hard mounts are great for situations where you have a high degree of
> faith that even if they server disappears, it will reappear soon. They
> suck when the server totally dies though, because now all the hard mount
> clients are stuck :-/.
We're working on fixing that.
--
Chuck Lever
More information about the Linux-nvme
mailing list