[PATCH rdma-next 4/4] nvme-rdma: add more error details when a QP moves to an error state
Leon Romanovsky
leon at kernel.org
Wed Sep 7 10:29:57 PDT 2022
On Wed, Sep 07, 2022 at 06:16:05PM +0300, Sagi Grimberg wrote:
>
> > > > From: Israel Rukshin <israelr at nvidia.com>
> > > >
> > > > Add debug prints for fatal QP events that are helpful for finding the
> > > > root cause of the errors. The ib_get_qp_err_syndrome is called at
> > > > a work queue since the QP event callback is running on an
> > > > interrupt context that can't sleep.
> > > >
> > > > Signed-off-by: Israel Rukshin <israelr at nvidia.com>
> > > > Reviewed-by: Max Gurtovoy <mgurtovoy at nvidia.com>
> > > > Reviewed-by: Leon Romanovsky <leonro at nvidia.com>
> > >
> > > What makes nvme-rdma special here? Why do you get this in
> > > nvme-rdma and not srp/iser/nfs-rdma/rds/smc/ipoib etc?
> > >
> > > This entire code needs to move to the rdma core instead
> > > of being leaked to ulps.
> >
> > We can move, but you will lose connection between queue number,
> > caller and error itself.
>
> That still doesn't explain why nvme-rdma is special.
It was important for us to get proper review from at least one ULP,
nvme-rdma is not special at all.
>
> In any event, the ulp can log the qpn so the context can be interrogated
> if that is important.
ok
More information about the Linux-nvme
mailing list