Unexpected issues with 2 NVME initiators using the same target
Sagi Grimberg
sagi at grimberg.me
Tue Jun 20 00:58:47 PDT 2017
>> Hi Robert,
>>
>>> I ran into this with 4.9.32 when I rebooted the target. I tested
>>> 4.12-rc6 and this particular error seems to have been resolved, but I
>>> now get a new one on the initiator. This one doesn't seem as
>>> impactful.
>>>
>>> [Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe
>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
>>> [Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2
>>
>> Max, Leon,
>>
>> Care to parse this syndrome for us? ;)
>
> Here the parsed output, it says that it was access to mkey which is
> free.
>
> ======== cqe_with_error ========
> wqe_id : 0x0
> srqn_usr_index : 0x0
> byte_cnt : 0x0
> hw_error_syndrome : 0x93
> hw_syndrome_type : 0x0
> vendor_error_syndrome : 0x52
Can you share the check that correlates to the vendor+hw syndrome?
> syndrome : LOCAL_PROTECTION_ERROR (0x4)
> s_wqe_opcode : SEND (0xa)
That's interesting, the opcode is a send operation. I'm assuming
that this is immediate-data write? Robert, did this happen when
you issued >4k writes to the target?
More information about the Linux-nvme
mailing list