[PATCH 1/2] nvme: fixup kato deadlock

Hannes Reinecke hare at suse.de
Tue Feb 23 07:07:27 EST 2021


A customer of ours has run into this deadlock with RDMA:
- The ka_work workqueue item is executed
- A new ka_work workqueue item is scheduled just after that.
- Now both, the kato request timeout _and_ the workqueue delay
  will execute at roughly the same time
- If the timing is correct the workqueue executes _before_
  the kato request timeout triggers
- Kato request timeout triggers, and starts error recovery
- error recovery deadlocks, as it needs to flush the kato
  workqueue item; this is stuck in nvme_alloc_request() as all
  reserved tags are in use.

The reserved tags would have been freed up later when cancelling all
outstanding requests in the queue:

	nvme_stop_keep_alive(&ctrl->ctrl);
	nvme_rdma_teardown_io_queues(ctrl, false);
	nvme_start_queues(&ctrl->ctrl);
	nvme_rdma_teardown_admin_queue(ctrl, false);
	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);

but as we're stuck in nvme_stop_keep_alive() we'll never get this far.

To fix this a new controller flag 'NVME_CTRL_KATO_RUNNING' is added
which will short-circuit the nvme_keep_alive() function if one
keep-alive command is already running.

Cc: Daniel Wagner <dwagner at suse.de>
Signed-off-by: Hannes Reinecke <hare at suse.de>
---
 drivers/nvme/host/core.c | 8 +++++++-
 drivers/nvme/host/nvme.h | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ea40a3c511da..9b8596eb4047 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1211,6 +1211,7 @@ static void nvme_keep_alive_end_io(struct request *rq, blk_status_t status)
 	bool startka = false;
 
 	blk_mq_free_request(rq);
+	clear_bit(NVME_CTRL_KATO_RUNNING, &ctrl->flags);
 
 	if (status) {
 		dev_err(ctrl->device,
@@ -1233,10 +1234,15 @@ static int nvme_keep_alive(struct nvme_ctrl *ctrl)
 {
 	struct request *rq;
 
+	if (test_and_set_bit(NVME_CTRL_KATO_RUNNING, &ctrl->flags))
+		return 0;
+
 	rq = nvme_alloc_request(ctrl->admin_q, &ctrl->ka_cmd,
 			BLK_MQ_REQ_RESERVED);
-	if (IS_ERR(rq))
+	if (IS_ERR(rq)) {
+		clear_bit(NVME_CTRL_KATO_RUNNING, &ctrl->flags);
 		return PTR_ERR(rq);
+	}
 
 	rq->timeout = ctrl->kato * HZ;
 	rq->end_io_data = ctrl;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index e6efa085f08a..e00e3400c8b6 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -344,6 +344,7 @@ struct nvme_ctrl {
 	int nr_reconnects;
 	unsigned long flags;
 #define NVME_CTRL_FAILFAST_EXPIRED	0
+#define NVME_CTRL_KATO_RUNNING		1
 	struct nvmf_ctrl_options *opts;
 
 	struct page *discard_page;
-- 
2.29.2




More information about the Linux-nvme mailing list