bad unlock balance WARNING at nvme/045

Shinichiro Kawasaki shinichiro.kawasaki at wdc.com
Tue Oct 25 19:13:15 PDT 2022


On Oct 18, 2022 / 13:57, Sagi Grimberg wrote:
> 
> > Hello Hannes,
> > 
> > I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As the Call
> > Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.
> > 
> > 	mutex_lock(&ctrl->dhchap_auth_mutex);
> > 	list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
> > 		mutex_unlock(&ctrl->dhchap_auth_mutex);
> > 		flush_work(&chap->auth_work);
> > 		__nvme_auth_reset(chap);
> > 	}
> > 	mutex_unlock(&ctrl->dhchap_auth_mutex);
> > 
> > I tried to remove the mutex_unlock in the list iteration with a patch [2], but
> > it resulted in another "WARNING: possible recursive locking detected" [3]. I'm
> > not sure but cause of this WARN could be __nvme_auth_work and
> > nvme_dhchap_auth_work in same nvme_wq.
> > 
> > Could you take a look for fix?

Sagi, thank you for the comments.

> I'm looking at the code and I think that the way the concurrent
> negotiations and how dhchap_auth_mutex is handled is very fragile,
> also why should the per-queue auth_work hold the controller-wide
> dhchap_auth_mutex?

It looks for me that dhchap_auth_mutex lock in nvme_auth_negotiate()
simply guards ctrl->dhchap_auth_list access (look up the list and
add an element to the list).

> The only reason I see is because nvme_auth_negotiate
> is checking if the chap context is already queued? Why should we
> allow that?

I'm not sure about this. I would like to ask other experts' comment on it.

> 
> I'd suggest to splice dhchap_auth_list, to a local list and then just
> flush nvmet_wq in teardown flows. Same for renegotiations/reset flows.

Thanks for the suggestion. I'm not sure how to use list splice of
dhchap_auth_list at renegotitation. But as for reset by nvme_auth_reset(),
your suggestion looks good for me to avoid the unbalanced lock. It can be
applied to tear down by nvme_auth_free() also. I created a patch [1] with
this approach. It just frees the dhchap contexts at reset and do not reuse
them for negotiation. I think this is fine since the contexts are re-allocated
at renegotiation.

> And we should prevent for the double-queuing of chap negotiations to
> begin with, instead of handling them (I still don't understand why this
> is permitted, but perhaps just return EBUSY in this case?)

Again, I'm not sure about this point.


I confirmed the patch [1] avoids the unbalanced lock. However, still the other
"WARNING: possible recursive locking detected" was reported. I created another
patch to add a workqueue dedicated to nvme auth, then it avoided the WARN. Are
these fix approches ok? If so, I can post the two patches for the two WARNs.

[1]

diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
index c8a6db7c4498..c71f1be7d403 100644
--- a/drivers/nvme/host/auth.c
+++ b/drivers/nvme/host/auth.c
@@ -920,17 +920,25 @@ int nvme_auth_wait(struct nvme_ctrl *ctrl, int qid)
 }
 EXPORT_SYMBOL_GPL(nvme_auth_wait);
 
-void nvme_auth_reset(struct nvme_ctrl *ctrl)
+static void nvme_auth_free_queue_contexts(struct nvme_ctrl *ctrl)
 {
-	struct nvme_dhchap_queue_context *chap;
+	struct nvme_dhchap_queue_context *chap = NULL, *tmp;
+	LIST_HEAD(splice);
 
 	mutex_lock(&ctrl->dhchap_auth_mutex);
-	list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
-		mutex_unlock(&ctrl->dhchap_auth_mutex);
+	list_splice_init(&ctrl->dhchap_auth_list, &splice);
+	mutex_unlock(&ctrl->dhchap_auth_mutex);
+
+	list_for_each_entry_safe(chap, tmp, &splice, entry) {
+		list_del_init(&chap->entry);
 		flush_work(&chap->auth_work);
-		__nvme_auth_reset(chap);
+		__nvme_auth_free(chap);
 	}
-	mutex_unlock(&ctrl->dhchap_auth_mutex);
+}
+
+void nvme_auth_reset(struct nvme_ctrl *ctrl)
+{
+	nvme_auth_free_queue_contexts(ctrl);
 }
 EXPORT_SYMBOL_GPL(nvme_auth_reset);
 
@@ -996,15 +1004,8 @@ EXPORT_SYMBOL_GPL(nvme_auth_stop);
 
 void nvme_auth_free(struct nvme_ctrl *ctrl)
 {
-	struct nvme_dhchap_queue_context *chap = NULL, *tmp;
+	nvme_auth_free_queue_contexts(ctrl);
 
-	mutex_lock(&ctrl->dhchap_auth_mutex);
-	list_for_each_entry_safe(chap, tmp, &ctrl->dhchap_auth_list, entry) {
-		list_del_init(&chap->entry);
-		flush_work(&chap->auth_work);
-		__nvme_auth_free(chap);
-	}
-	mutex_unlock(&ctrl->dhchap_auth_mutex);
 	if (ctrl->host_key) {
 		nvme_auth_free_key(ctrl->host_key);
 		ctrl->host_key = NULL;



-- 
Shin'ichiro Kawasaki


More information about the Linux-nvme mailing list