Unexpected issues with 2 NVME initiators using the same target

Marta Rybczynska mrybczyn at kalray.eu
Tue Apr 11 05:47:26 PDT 2017


----- Mail original -----
> On 4/10/2017 2:40 PM, Marta Rybczynska wrote:
>>> On 3/17/2017 8:37 PM, Gruher, Joseph R wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Max Gurtovoy [mailto:maxg at mellanox.com]
>>>>>
>>>>> I think we need to add fence to the UMR wqe.
>>>>>
>>>>> diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
>>>>> index ad8a263..c38c4fa 100644
>>>>> --- a/drivers/infiniband/hw/mlx5/qp.c
>>>>> +++ b/drivers/infiniband/hw/mlx5/qp.c
>>>>> @@ -3737,8 +3737,7 @@ static void dump_wqe(struct mlx5_ib_qp *qp, int idx,
>>>>> int size_16)
>>>>>
>>>>>   static u8 get_fence(u8 fence, struct ib_send_wr *wr)
>>>>>   {
>>>>> -       if (unlikely(wr->opcode == IB_WR_LOCAL_INV &&
>>>>> -                    wr->send_flags & IB_SEND_FENCE))
>>>>> +       if (wr->opcode == IB_WR_LOCAL_INV || wr->opcode == IB_WR_REG_MR)
>>>>>                  return MLX5_FENCE_MODE_STRONG_ORDERING;
>>>>>
>>>>>          if (unlikely(fence)) {
>>>>>
>>>>>
>>>>
>>>> You mention these patches are only for testing.  How do we get to something
>>>> which can be submitted to upstream?
>>>
>>> Yes, we need to be careful and not put the strong_fence if it's not a must.
>>> I'll be out for the upcoming week, but I'll ask our mlx5 maintainers to
>>> prepare a suitable patch and check some other applications performance
>>> numbers.
>>> Thanks for the testing, you can use this patch meanwhile till we push
>>> the formal solution.
>>>
>>
>> Hello Max,
>> We're seeing the same issue in our setup and we're running this patch
>> on our system for some time already. It seems to have fixed the issue.
>> When there is a final patch available, we can test it too.
>>
>> Thanks,
>> Marta
>>
> 
> Hi Marta,
> thanks for testing my patch. I'll send it early next week (holiday's in
> our country) so it will be available in 4.11 kernel hopefully.
> if you can share on which NIC's you tested it and the perf numbers you
> get (with and without the patch), it will be great.
> 

Hello Max,
That's great news. We're ready to start testing as the final version
of the patch is out. We're using ConnectX 4. Unfortunately in our use
case the patch is not a question of performance: the workload doesn't
work without it. We may think about running other workloads for the
tests.

Marta



More information about the Linux-nvme mailing list