Unexpected issues with 2 NVME initiators using the same target

Chuck Lever chuck.lever at oracle.com
Mon Jul 10 14:29:53 PDT 2017


> On Jul 10, 2017, at 5:24 PM, Jason Gunthorpe <jgunthorpe at obsidianresearch.com> wrote:
> 
> On Mon, Jul 10, 2017 at 03:03:18PM -0400, Chuck Lever wrote:
> 
>>>> Or I could revert all the "map page cache pages" logic and
>>>> just use memcpy for small NFS WRITEs, and RDMA the rest of
>>>> the time. That keeps everything simple, but means large
>>>> inline thresholds can't use send-in-place.
>>> 
>>> Don't you have the same problem with RDMA WRITE?
>> 
>> The server side initiates RDMA Writes. The final RDMA Write in a WR
>> chain is signaled, but a subsequent Send completion is used to
>> determine when the server may release resources used for the Writes.
>> We're already doing it the slow way there, and there's no ^C hazard
>> on the server.
> 
> Wait, I guess I meant RDMA READ path.
> 
> The same contraints apply to RKeys as inline send - you cannot DMA
> unmap rkey memory until the rkey is invalidated at the HCA.
> 
> So posting an invalidate SQE and then immediately unmapping the DMA
> pages is bad too..
> 
> No matter how the data is transfered the unmapping must follow the
> same HCA synchronous model.. DMA unmap must only be done from the send
> completion handler (inline send or invalidate rkey), from the recv
> completion handler (send with invalidate), or from QP error state teardown.
> 
> Anything that does DMA memory unmap from another thread is very, very
> suspect, eg async from a ctrl-c trigger event.

4.13 server side is converted to use the rdma_rw API for
handling RDMA Read. For non-iWARP cases, it's using the
local DMA key for Read sink buffers. For iWARP it should
be using Read-with-invalidate (IIRC).

--
Chuck Lever






More information about the Linux-nvme mailing list