Unexpected issues with 2 NVME initiators using the same target
Chuck Lever
chuck.lever at oracle.com
Mon Jul 10 14:29:53 PDT 2017
> On Jul 10, 2017, at 5:24 PM, Jason Gunthorpe <jgunthorpe at obsidianresearch.com> wrote:
>
> On Mon, Jul 10, 2017 at 03:03:18PM -0400, Chuck Lever wrote:
>
>>>> Or I could revert all the "map page cache pages" logic and
>>>> just use memcpy for small NFS WRITEs, and RDMA the rest of
>>>> the time. That keeps everything simple, but means large
>>>> inline thresholds can't use send-in-place.
>>>
>>> Don't you have the same problem with RDMA WRITE?
>>
>> The server side initiates RDMA Writes. The final RDMA Write in a WR
>> chain is signaled, but a subsequent Send completion is used to
>> determine when the server may release resources used for the Writes.
>> We're already doing it the slow way there, and there's no ^C hazard
>> on the server.
>
> Wait, I guess I meant RDMA READ path.
>
> The same contraints apply to RKeys as inline send - you cannot DMA
> unmap rkey memory until the rkey is invalidated at the HCA.
>
> So posting an invalidate SQE and then immediately unmapping the DMA
> pages is bad too..
>
> No matter how the data is transfered the unmapping must follow the
> same HCA synchronous model.. DMA unmap must only be done from the send
> completion handler (inline send or invalidate rkey), from the recv
> completion handler (send with invalidate), or from QP error state teardown.
>
> Anything that does DMA memory unmap from another thread is very, very
> suspect, eg async from a ctrl-c trigger event.
4.13 server side is converted to use the rdma_rw API for
handling RDMA Read. For non-iWARP cases, it's using the
local DMA key for Read sink buffers. For iWARP it should
be using Read-with-invalidate (IIRC).
--
Chuck Lever
More information about the Linux-nvme
mailing list