Unexpected issues with 2 NVME initiators using the same target

Bart Van Assche Bart.VanAssche at wdc.com
Tue Jun 27 11:08:57 PDT 2017


On Tue, 2017-06-27 at 10:37 +0300, Sagi Grimberg wrote:
> Jason,
> 
> > > The issue about the HCA not being able to access the inline
> > > buffer during a retransmit is also not an issue for RPC-
> > > over-RDMA because these buffers are always registered with
> > > the local rdma lkey.
> > 
> > Exactly.
> 
> Lost track of the thread...
> 
> 
> Indeed you raised this issue lots of times before, and I failed to see
> why its important or why its error prone, but now I do...
> 
> My apologies for not listening :(
> 
> We should fix _all_ initiators for it, nvme-rdma, iser, srp
> and xprtrdma (and probably some more ULPs out there)...
> 
> It also means that we cannot really suppress any send completions as
> that would result in an unpredictable latency (which is not acceptable).
> 
> I wish we could somehow tell the HCA that it can ignore access fail to a
> specific address when retransmitting.. but maybe its too much to ask...

Hello Sagi,

Can you clarify why you think that the SRP initiator needs to be changed?
The SRP initiator submits the local invalidate work request after the RDMA
write request. According to table 79 "Work Request Operation Ordering" the
order of these work requests must be maintained by the HCA. I think if a HCA
would start with invalidating the MR before the remote HCA has acknowledged
the written data that that's a firmware bug.

The upstream SRP initiator does not use inline data.

Bart.


More information about the Linux-nvme mailing list