kernel NULL pointer during reset_controller operation with IO on 4.11.0-rc7

Yi Zhang yizhan at redhat.com
Thu Aug 31 00:15:42 PDT 2017


> I couldn't repro it, but for some reason you got an overflow in the QP 
> send queue.
> seems like something might be wrong with the calculation (probably 
> signaling calculation).
>
> please supply more details:
> 1. link layer ?
> 2. HCA type + FW versions on target/host sides ?
> 3. B2B connection ?
>
> try this one as a first step:
>
Hi Max
I retest this issue on 4.13.0-rc6/4.13.0-rc7 without your patch, found 
this issue cannot be reproduced any more.
Here is my environment:
link layer:mlx5_roce
HCA:
04:00.0 Infiniband controller: Mellanox Technologies MT27700 Family 
[ConnectX-4]
04:00.1 Infiniband controller: Mellanox Technologies MT27700 Family 
[ConnectX-4]
05:00.0 Ethernet controller: Mellanox Technologies MT27710 Family 
[ConnectX-4 Lx]
05:00.1 Ethernet controller: Mellanox Technologies MT27710 Family 
[ConnectX-4 Lx]
Firmware:
[   13.489854] mlx5_core 0000:04:00.0: firmware version: 12.18.1000
[   14.360121] mlx5_core 0000:04:00.1: firmware version: 12.18.1000
[   15.091088] mlx5_core 0000:05:00.0: firmware version: 14.18.1000
[   15.936417] mlx5_core 0000:05:00.1: firmware version: 14.18.1000
The two server connected by switch.

Will let you know and retest your patch when I reproduced it in the future.

Thanks
Yi

> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 82fcb07..1437306 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -88,6 +88,7 @@ struct nvme_rdma_queue {
>         struct nvme_rdma_qe     *rsp_ring;
>         atomic_t                sig_count;
>         int                     queue_size;
> +       int                     limit_mask;
>         size_t                  cmnd_capsule_len;
>         struct nvme_rdma_ctrl   *ctrl;
>         struct nvme_rdma_device *device;
> @@ -521,6 +522,7 @@ static int nvme_rdma_init_queue(struct 
> nvme_rdma_ctrl *ctrl,
>
>         queue->queue_size = queue_size;
>         atomic_set(&queue->sig_count, 0);
> +       queue->limit_mask = (min(32, 1 << ilog2((queue->queue_size + 
> 1) / 2))) - 1;
>
>         queue->cm_id = rdma_create_id(&init_net, nvme_rdma_cm_handler, 
> queue,
>                         RDMA_PS_TCP, IB_QPT_RC);
> @@ -1009,9 +1011,7 @@ static void nvme_rdma_send_done(struct ib_cq 
> *cq, struct ib_wc *wc)
>   */
>  static inline bool nvme_rdma_queue_sig_limit(struct nvme_rdma_queue 
> *queue)
>  {
> -       int limit = 1 << ilog2((queue->queue_size + 1) / 2);
> -
> -       return (atomic_inc_return(&queue->sig_count) & (limit - 1)) == 0;
> +       return (atomic_inc_return(&queue->sig_count) & 
> (queue->limit_mask)) == 0;
>  }
>
>  static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
>
>
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme




More information about the Linux-nvme mailing list