[PATCH rdma-next 4/4] nvme-rdma: add more error details when a QP moves to an error state
Patrisious Haddad
phaddad at nvidia.com
Thu Sep 8 00:55:34 PDT 2022
On 07/09/2022 18:18, Christoph Hellwig wrote:
> External email: Use caution opening links or attachments
>
>
> On Wed, Sep 07, 2022 at 06:16:05PM +0300, Sagi Grimberg wrote:
>>>>
>>>> This entire code needs to move to the rdma core instead
>>>> of being leaked to ulps.
>>>
>>> We can move, but you will lose connection between queue number,
>>> caller and error itself.
>>
>> That still doesn't explain why nvme-rdma is special.
>>
>> In any event, the ulp can log the qpn so the context can be interrogated
>> if that is important.
>
> I also don't see why the QP event handler can't be called
> from user context to start with. I see absolutely no reason to
> add boilerplate code to drivers for reporting slighly more verbose
> errors on one specific piece of hrdware. I'd say clean up the mess
> that is the QP event handler first, and then once error reporting
> becomes trivial we can just do it.
I would like to emphasize that it is not just about slightly more
verbose error, but mainly it is about an error that wouldn't have been
reported at all without this feature, as I previously mentioned error
cases in which the remote side doesn't generate a CQE, the remote side
wouldn't even know why the QP was moved to error state without this feature.
More information about the Linux-nvme
mailing list