Unexpected issues with 2 NVME initiators using the same target

Mon Jul 10 14:14:06 PDT 2017

On Mon, Jul 10, 2017 at 04:51:20PM -0400, Chuck Lever wrote:
> 
> > On Jul 10, 2017, at 4:05 PM, Jason Gunthorpe <jgunthorpe at obsidianresearch.com> wrote:
> > 
> > On Mon, Jul 10, 2017 at 03:03:18PM -0400, Chuck Lever wrote:
> > 
> >> One option is to somehow split the Send-related data structures from
> >> rpcrdma_req, and manage them independently. I've already done that for
> >> MRs: MR state is now located in rpcrdma_mw.
> > 
> > Yes, this is is what I was implying.. Track the SQE related stuff
> > seperately in memory allocated during SQ setup - MR, dma maps, etc.
> 
> > No need for an atomic/lock then, right? The required memory is bounded
> > since the inline send depth is bounded.
> 
> Perhaps I lack some imagination, but I don't see how I can manage
> these small objects without a serialized free list or circular
> array that would be accessed in the forward path and also in a
> Send completion handler.

I don't get it, dma unmap can only ever happen in the send completion
handler, it can never happen in the forward path. (this is the whole
point of this thread)

Since you are not using send completion today you can just use the
wr_id to point to the pre-allocated memory containing the pages to
invalidate. Completely remove dma unmap from the forward path.

Usually I work things out so that the meta-data array is a ring and
every SQE post consumes a meta-data entry. Then occasionally I signal
completion and provide a wr_id of the latest ring index and the
completion handler runs through all the accumulated meta-data and acts
on it (eg unmaps/etc). This approach still allows batching
completions.

Since ring entries are bounded size we just preallocate the largest
size at QP creation. In this case it is some multiple of the number of
inline send pages * number of SQE entries.

> This seems like a lot of overhead to deal with a very uncommon
> case. I can reduce this overhead by signaling only Sends that
> need to unmap page cache pages, but still.

Yes, but it is not avoidable..

> As we previously discussed, xprtrdma does SQ accounting using RPC
> completion as the gate. Basically xprtrdma will send another RPC
> as soon as a previous one is terminated. If the Send WR is still
> running when the RPC terminates, I can potentially overrun the
> Send Queue.

Makes sense. The SQ accounting must be precise.

Jason