kernel NULL pointer during reset_controller operation with IO on 4.11.0-rc7
Yi Zhang
yizhan at redhat.com
Thu Aug 31 00:15:42 PDT 2017
> I couldn't repro it, but for some reason you got an overflow in the QP
> send queue.
> seems like something might be wrong with the calculation (probably
> signaling calculation).
>
> please supply more details:
> 1. link layer ?
> 2. HCA type + FW versions on target/host sides ?
> 3. B2B connection ?
>
> try this one as a first step:
>
Hi Max
I retest this issue on 4.13.0-rc6/4.13.0-rc7 without your patch, found
this issue cannot be reproduced any more.
Here is my environment:
link layer:mlx5_roce
HCA:
04:00.0 Infiniband controller: Mellanox Technologies MT27700 Family
[ConnectX-4]
04:00.1 Infiniband controller: Mellanox Technologies MT27700 Family
[ConnectX-4]
05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family
[ConnectX-4 Lx]
05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family
[ConnectX-4 Lx]
Firmware:
[ 13.489854] mlx5_core 0000:04:00.0: firmware version: 12.18.1000
[ 14.360121] mlx5_core 0000:04:00.1: firmware version: 12.18.1000
[ 15.091088] mlx5_core 0000:05:00.0: firmware version: 14.18.1000
[ 15.936417] mlx5_core 0000:05:00.1: firmware version: 14.18.1000
The two server connected by switch.
Will let you know and retest your patch when I reproduced it in the future.
Thanks
Yi
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 82fcb07..1437306 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -88,6 +88,7 @@ struct nvme_rdma_queue {
> struct nvme_rdma_qe *rsp_ring;
> atomic_t sig_count;
> int queue_size;
> + int limit_mask;
> size_t cmnd_capsule_len;
> struct nvme_rdma_ctrl *ctrl;
> struct nvme_rdma_device *device;
> @@ -521,6 +522,7 @@ static int nvme_rdma_init_queue(struct
> nvme_rdma_ctrl *ctrl,
>
> queue->queue_size = queue_size;
> atomic_set(&queue->sig_count, 0);
> + queue->limit_mask = (min(32, 1 << ilog2((queue->queue_size +
> 1) / 2))) - 1;
>
> queue->cm_id = rdma_create_id(&init_net, nvme_rdma_cm_handler,
> queue,
> RDMA_PS_TCP, IB_QPT_RC);
> @@ -1009,9 +1011,7 @@ static void nvme_rdma_send_done(struct ib_cq
> *cq, struct ib_wc *wc)
> */
> static inline bool nvme_rdma_queue_sig_limit(struct nvme_rdma_queue
> *queue)
> {
> - int limit = 1 << ilog2((queue->queue_size + 1) / 2);
> -
> - return (atomic_inc_return(&queue->sig_count) & (limit - 1)) == 0;
> + return (atomic_inc_return(&queue->sig_count) &
> (queue->limit_mask)) == 0;
> }
>
> static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
>
>
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
More information about the Linux-nvme
mailing list