[PATCH rdma-next 4/4] nvme-rdma: add more error details when a QP moves to an error state

Sagi Grimberg sagi at grimberg.me
Wed Sep 7 08:16:05 PDT 2022


>>> From: Israel Rukshin <israelr at nvidia.com>
>>>
>>> Add debug prints for fatal QP events that are helpful for finding the
>>> root cause of the errors. The ib_get_qp_err_syndrome is called at
>>> a work queue since the QP event callback is running on an
>>> interrupt context that can't sleep.
>>>
>>> Signed-off-by: Israel Rukshin <israelr at nvidia.com>
>>> Reviewed-by: Max Gurtovoy <mgurtovoy at nvidia.com>
>>> Reviewed-by: Leon Romanovsky <leonro at nvidia.com>
>>
>> What makes nvme-rdma special here? Why do you get this in
>> nvme-rdma and not srp/iser/nfs-rdma/rds/smc/ipoib etc?
>>
>> This entire code needs to move to the rdma core instead
>> of being leaked to ulps.
> 
> We can move, but you will lose connection between queue number,
> caller and error itself.

That still doesn't explain why nvme-rdma is special.

In any event, the ulp can log the qpn so the context can be interrogated
if that is important.



More information about the Linux-nvme mailing list