nvme/rdma initiator stuck on reboot
Sagi Grimberg
sagi at grimberg.me
Fri Aug 19 01:58:33 PDT 2016
> Btw, in that case the patch is not actually correct, as even workqueue
> with a higher concurrency level MAY deadlock under enough memory
> pressure. We'll need separate workqueues to handle this case I think.
Steve, does it help if you run the delete on the system_long_wq [1]?
Note, I've seen problems with forward progress when sharing
a workqueue between teardown/reconnect sequences and the rest of
the system (mostly in srp).
>> Yes? And the
>> reconnect worker was never completing? Why is that? Here are a few tidbits
>> about iWARP connections: address resolution == neighbor discovery. So if the
>> neighbor is unreachable, it will take a few seconds for the OS to give up and
>> fail the resolution. If the neigh entry is valid and the peer becomes
>> unreachable during connection setup, it might take 60 seconds or so for a
>> connect operation to give up and fail. So this is probably slowing the
>> reconnect thread down. But shouldn't the reconnect thread notice that a delete
>> is trying to happen and bail out?
>
> I think we should aim for a state machine that can detect this, but
> we'll have to see if that will end up in synchronization overkill.
The reconnect logic does take care of this state transition...
[1]:
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 8d2875b4c56d..93ea2831ff31 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1342,7 +1342,7 @@ static int nvme_rdma_device_unplug(struct
nvme_rdma_queue *queue)
}
/* Queue controller deletion */
- queue_work(nvme_rdma_wq, &ctrl->delete_work);
+ queue_work(system_long_wq, &ctrl->delete_work);
flush_work(&ctrl->delete_work);
return ret;
}
@@ -1681,7 +1681,7 @@ static int __nvme_rdma_del_ctrl(struct
nvme_rdma_ctrl *ctrl)
if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
return -EBUSY;
- if (!queue_work(nvme_rdma_wq, &ctrl->delete_work))
+ if (!queue_work(system_long_wq, &ctrl->delete_work))
return -EBUSY;
return 0;
@@ -1763,7 +1763,7 @@ static int nvme_rdma_reset_ctrl(struct nvme_ctrl
*nctrl)
if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
return -EBUSY;
- if (!queue_work(nvme_rdma_wq, &ctrl->reset_work))
+ if (!queue_work(system_long_wq, &ctrl->reset_work))
return -EBUSY;
flush_work(&ctrl->reset_work);
--
More information about the Linux-nvme
mailing list