[PATCH rdma-next 4/4] nvme-rdma: add more error details when a QP moves to an error state

Leon Romanovsky leon at kernel.org
Wed Sep 7 05:51:16 PDT 2022


On Wed, Sep 07, 2022 at 03:34:21PM +0300, Sagi Grimberg wrote:
> 
> > From: Israel Rukshin <israelr at nvidia.com>
> > 
> > Add debug prints for fatal QP events that are helpful for finding the
> > root cause of the errors. The ib_get_qp_err_syndrome is called at
> > a work queue since the QP event callback is running on an
> > interrupt context that can't sleep.
> > 
> > Signed-off-by: Israel Rukshin <israelr at nvidia.com>
> > Reviewed-by: Max Gurtovoy <mgurtovoy at nvidia.com>
> > Reviewed-by: Leon Romanovsky <leonro at nvidia.com>
> 
> What makes nvme-rdma special here? Why do you get this in
> nvme-rdma and not srp/iser/nfs-rdma/rds/smc/ipoib etc?
> 
> This entire code needs to move to the rdma core instead
> of being leaked to ulps.

We can move, but you will lose connection between queue number,
caller and error itself.

As I answered to Christoph, we will need to execute query QP command
in a workqueue outside of event handler.

So you will get a print about queue in error state and later you will
see parsed error print somewhere in the dmesg.

Thanks



More information about the Linux-nvme mailing list