Unexpected issues with 2 NVME initiators using the same target

Chuck Lever chuck.lever at oracle.com
Mon Jul 10 20:57:32 PDT 2017


> On Jul 10, 2017, at 6:09 PM, Jason Gunthorpe <jgunthorpe at obsidianresearch.com> wrote:
> 
> On Mon, Jul 10, 2017 at 06:04:18PM -0400, Chuck Lever wrote:
> 
>>> The server sounds fine, how does the client work?
>> 
>> The client does not initiate RDMA Read or Write today.
> 
> Right, but it provides an rkey that the server uses for READ or WRITE.
> 
> The invalidate of that rkey at the client must follow the same rules
> as inline send.

Ah, I see.

The RPC reply handler calls frwr_op_unmap_sync to invalidate
any MRs associated with the RPC.

frwr_op_unmap_sync has to sort the rkeys that are remotely
invalidated, and those that have not been.

The first step is to ensure all the rkeys for an RPC are
invalid. The rkey that was remotely invalidated is skipped
here, and a chain of LocalInv WRs is posted to invalidate
any remaining rkeys. The last WR in the chain is signaled.

If one or more LocalInv WRs are posted, this function waits
for LocalInv completion.

The last step is always DMA unmapping. Note that we can't
get a completion for a remotely invalidated rkey, and we
have to wait for LocalInv to complete anyway. So the DMA
unmapping is always handled here instead of in a
completion handler.

When frwr_op_unmap_sync returns to the RPC reply handler,
the handler calls xprt_complete_rqst, and the RPC is
terminated. This guarantees that the MRs are invalid before
control is returned to the RPC consumer.


In the ^C case, frwr_op_unmap_safe is invoked during RPC
termination. The MRs are passed to the background recovery
task, which invokes frwr_op_recover_mr.

frwr_op_recover_mr destroys the fr_mr and DMA unmaps the
memory. (It's also used when registration or invalidation
flushes, which is why it uses a hammer).

So here, we're a little fast/loose: the ordering of
invalidation and unmapping is correct, but the MRs can be
invalidated after the RPC completes. Since RPC termination
can't wait, this is the best I can do for now.


--
Chuck Lever






More information about the Linux-nvme mailing list