nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230

Engel, Amit Amit.Engel at Dell.com
Wed Jun 9 01:39:42 PDT 2021


Im not sure that using the queue_lock mutex ill help
The race in this case is between sock_release and nvme_tcp_restore_sock_calls
sock_release is being called as part of nvme_tcp_free_queue which is destroying the mutex

-----Original Message-----
From: Sagi Grimberg <sagi at grimberg.me> 
Sent: Wednesday, June 9, 2021 11:05 AM
To: Engel, Amit; linux-nvme at lists.infradead.org
Cc: Anner, Ran; Grupi, Elad
Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230


[EXTERNAL EMAIL] 


> Hi Sagi,
> 
> Indeed RHEL8.3 does not have the mutex protection on 
> nvme_tcp_stop_queue However, in our case, based on the below back 
> trace We don't get to __nvme_tcp_stop_queue from nvme_tcp_stop_queue 
> We get to it from:
> nvme_tcp_reconnect_ctrl_work --> nvme_tcp_setup_ctrl --> 
> nvme_tcp_start_queue  --> __nvme_tcp_stop_queue
> 
> so I'm not sure how this mutex protection will help in this case


Oh, well iirc we probably need the same mutex protection in start failure case then?
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 216d21a6a165..00dff3654e6f 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1548,6 +1548,7 @@ static void nvme_tcp_stop_queue(struct nvme_ctrl *nctrl, int qid)
  static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
  {
         struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
+       struct nvme_tcp_queue *queue = &ctrl->queues[idx];
         int ret;

         if (idx)
@@ -1556,10 +1557,12 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
                 ret = nvmf_connect_admin_queue(nctrl);

         if (!ret) {
-               set_bit(NVME_TCP_Q_LIVE, &ctrl->queues[idx].flags);
+               set_bit(NVME_TCP_Q_LIVE, &queue->flags);
         } else {
-               if (test_bit(NVME_TCP_Q_ALLOCATED, 
&ctrl->queues[idx].flags))
-                       __nvme_tcp_stop_queue(&ctrl->queues[idx]);
+               mutex_lock(&queue->queue_lock);
+               if (test_bit(NVME_TCP_Q_ALLOCATED, &queue->flags))
+                       __nvme_tcp_stop_queue(queue);
+               mutex_unlock(&queue->queue_lock);
                 dev_err(nctrl->device,
                         "failed to connect queue: %d ret=%d\n", idx, ret);
         }
--


More information about the Linux-nvme mailing list