Unexpected issues with 2 NVME initiators using the same target
Robert LeBlanc
robert at leblancnet.us
Tue Jun 20 07:41:09 PDT 2017
On Tue, Jun 20, 2017 at 1:58 AM, Sagi Grimberg <sagi at grimberg.me> wrote:
>
>>> Hi Robert,
>>>
>>>> I ran into this with 4.9.32 when I rebooted the target. I tested
>>>> 4.12-rc6 and this particular error seems to have been resolved, but I
>>>> now get a new one on the initiator. This one doesn't seem as
>>>> impactful.
>>>>
>>>> [Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe
>>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
>>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
>>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
>>>> [Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2
>>>
>>>
>>> Max, Leon,
>>>
>>> Care to parse this syndrome for us? ;)
>>
>>
>> Here the parsed output, it says that it was access to mkey which is
>> free.
>>
>> ======== cqe_with_error ========
>> wqe_id : 0x0
>> srqn_usr_index : 0x0
>> byte_cnt : 0x0
>> hw_error_syndrome : 0x93
>> hw_syndrome_type : 0x0
>> vendor_error_syndrome : 0x52
>
>
> Can you share the check that correlates to the vendor+hw syndrome?
>
>> syndrome : LOCAL_PROTECTION_ERROR (0x4)
>> s_wqe_opcode : SEND (0xa)
>
>
> That's interesting, the opcode is a send operation. I'm assuming
> that this is immediate-data write? Robert, did this happen when
> you issued >4k writes to the target?
I was running dd with oflag=direct, so yes.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
More information about the Linux-nvme
mailing list