Unexpected issues with 2 NVME initiators using the same target

Sagi Grimberg sagi at grimberg.me
Tue Jun 20 00:58:47 PDT 2017


>> Hi Robert,
>>
>>> I ran into this with 4.9.32 when I rebooted the target. I tested
>>> 4.12-rc6 and this particular error seems to have been resolved, but I
>>> now get a new one on the initiator. This one doesn't seem as
>>> impactful.
>>>
>>> [Mon Jun 19 11:17:20 2017] mlx5_0:dump_cqe:275:(pid 0): dump error cqe
>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
>>> [Mon Jun 19 11:17:20 2017] 00000000 00000000 00000000 00000000
>>> [Mon Jun 19 11:17:20 2017] 00000000 93005204 0a0001bd 45c8e0d2
>>
>> Max, Leon,
>>
>> Care to parse this syndrome for us? ;)
> 
> Here the parsed output, it says that it was access to mkey which is
> free.
> 
> ======== cqe_with_error ========
> wqe_id                           : 0x0
> srqn_usr_index                   : 0x0
> byte_cnt                         : 0x0
> hw_error_syndrome                : 0x93
> hw_syndrome_type                 : 0x0
> vendor_error_syndrome            : 0x52

Can you share the check that correlates to the vendor+hw syndrome?

> syndrome                         : LOCAL_PROTECTION_ERROR (0x4)
> s_wqe_opcode                     : SEND (0xa)

That's interesting, the opcode is a send operation. I'm assuming
that this is immediate-data write? Robert, did this happen when
you issued >4k writes to the target?



More information about the Linux-nvme mailing list