Unexpected issues with 2 NVME initiators using the same target
Chuck Lever
chuck.lever at oracle.com
Tue Jun 20 13:56:39 PDT 2017
> On Jun 20, 2017, at 3:27 PM, Jason Gunthorpe <jgunthorpe at obsidianresearch.com> wrote:
>
> On Tue, Jun 20, 2017 at 02:17:39PM -0400, Chuck Lever wrote:
>
>> The concern is whether a retransmitted Send will be exposed
>> to the receiving ULP. Below you imply that it will not be, so
>> perhaps this is not a concern after all.
>
> A retransmitted SEND will never be exposed to the Reciever ULP for
> Reliable Connected. That is part of the guarantee.
>
>>> We've had this discussion on the list before. You can *never* re-use a
>>> SEND, or RDMA WRITE buffer until you observe the HCA is done with it
>>> via a CQ poll.
>>
>> RPC-over-RDMA is careful to invalidate buffers that are the
>> target of RDMA Write before RPC completion, as we have
>> discussed before.
>>
>> Sends are assumed to be complete when a LocalInv completes.
>>
>> When we had this discussion before, you explained the problem
>> with retransmitted Sends, but it appears that all the ULPs we
>> have operate without Send completion. Others whom I trust have
>> suggested that operating without that extra interrupt is
>
> Operating without the interrupt is of course preferred, but that means
> you have to defer the invalidate for MR's refered to by SEND until a
> CQ observation as well.
>
>> preferred. The client has operated this way since it was added
>> to the kernel almost 10 years ago.
>
> I thought the use of MR's with SEND was a new invention? If you use
> the local rdma lkey with send, it is never invalidated, and this is
> not an issue, which IIRC, was the historical configuration for NFS.
We may be conflating things a bit.
RPC-over-RDMA client uses persistently registered buffers, using
the lkey, for inline data. The use of MRs is reserved for NFS READ
and WRITE payloads. The inline buffers are never explicitly
invalidated by RPC-over-RDMA.
>> So I took it as a "in a perfect world" kind of admonition.
>> You are making a stronger and more normative assertion here.
>
> All ULPs must have periodic (related to SQ depth) signaled completions
> or some of our supported hardware will explode.
RPC-over-RDMA client does that.
> All ULPs must flow control additions to the SQ based on CQ feedback,
> or they will fail under load with SQ overflows, if this is done, then
> the above happens correctly for free.
RPC-over-RDMA client does that.
> All ULPs must ensure SEND/RDMA Write resources remain stable until the
> CQ indicates that work is completed. 'In a perfect world' this
> includes not changing the source memory as that would cause
> retransmitted packets to be different.
I assume you mean the sending side (the server) for RDMA
Write. I believe rdma_rw uses the local rdma lkey by default
for RDMA Write source buffers.
> All ULPs must ensure the lkey remains valid until the CQ confirms
> the work is done. This is not important if the lkey is always the
> local rdma lkey, which is always valid.
As above, Send buffers use the local rdma lkey.
>>> No. The SQ side is asynchronous to the CQ side, the HCA will pipeline
>>> send packets on the wire up to some internal limit.
>>
>> So if my ULP issues FastReg followed by Send followed by
>> LocalInv (signaled), I can't rely on the LocalInv completion
>> to imply that the Send is also complete?
>
> Correct.
>
> This is explicitly defined in Table 79 of the IBA.
>
> It describes the ordering requirements, if you order Send followed by
> LocalInv the ordering is 'L' which means they are not ordered unless
> the WR has the Local Invalidate Fence bit set.
>
> LIF is an optional feature, I do not know if any of our hardware
> supports it, but it is defined to cause the local invalidate to wait
> until all ongoing references to the MR are completed.
Now, since there was confusion about using an MR for a
Send operation, let me clarify. If the client does:
FastReg(payload buffer)
Send(inline buffer)
...
Recv
LocalInv(payload buffer)
wait for LI completion
Is setting IB_SEND_FENCE on the LocalInv enough to ensure
that the Send is complete?
cscope seems to suggest all our devices support IB_SEND_FENCE.
Sagi mentioned some devices do this fencing automatically.
> No idea on the relative performance of LIF vs doing it manually, but
> the need for one or the other is unambigously clear in the spec.
It seems to me that the guarantee that the server sees
only one copy of the Send payload is good enough. That
means that by the time Recv completion occurs on the
client, even if the client HCA still thinks it needs to
retransmit the Send containing the RPC Call, the server
ULP has already seen and processed that Send payload,
and the HCA on the server won't deliver that payload a
second time.
The RPC Reply is evidence that the server saw the correct
RPC Call message payload, and the client always preserves
the Send's inline buffer until the reply has been received.
If the only concern about preserving that inline buffer is
guaranteeing that retransmits contain the same content, I
don't think we have a problem. All HCA retransmits of an
RPC Call, until the matching RPC Reply is received on the
client, will contain the same content.
The issue about the HCA not being able to access the inline
buffer during a retransmit is also not an issue for RPC-
over-RDMA because these buffers are always registered with
the local rdma lkey.
> Why are you invaliding lkeys anyhow, that doesn't seem like something
> that needs to happen synchronously.
--
Chuck Lever
More information about the Linux-nvme
mailing list