bad unlock balance WARNING at nvme/045
Shinichiro Kawasaki
shinichiro.kawasaki at wdc.com
Tue Oct 25 19:13:15 PDT 2022
On Oct 18, 2022 / 13:57, Sagi Grimberg wrote:
>
> > Hello Hannes,
> >
> > I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As the Call
> > Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.
> >
> > mutex_lock(&ctrl->dhchap_auth_mutex);
> > list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
> > mutex_unlock(&ctrl->dhchap_auth_mutex);
> > flush_work(&chap->auth_work);
> > __nvme_auth_reset(chap);
> > }
> > mutex_unlock(&ctrl->dhchap_auth_mutex);
> >
> > I tried to remove the mutex_unlock in the list iteration with a patch [2], but
> > it resulted in another "WARNING: possible recursive locking detected" [3]. I'm
> > not sure but cause of this WARN could be __nvme_auth_work and
> > nvme_dhchap_auth_work in same nvme_wq.
> >
> > Could you take a look for fix?
Sagi, thank you for the comments.
> I'm looking at the code and I think that the way the concurrent
> negotiations and how dhchap_auth_mutex is handled is very fragile,
> also why should the per-queue auth_work hold the controller-wide
> dhchap_auth_mutex?
It looks for me that dhchap_auth_mutex lock in nvme_auth_negotiate()
simply guards ctrl->dhchap_auth_list access (look up the list and
add an element to the list).
> The only reason I see is because nvme_auth_negotiate
> is checking if the chap context is already queued? Why should we
> allow that?
I'm not sure about this. I would like to ask other experts' comment on it.
>
> I'd suggest to splice dhchap_auth_list, to a local list and then just
> flush nvmet_wq in teardown flows. Same for renegotiations/reset flows.
Thanks for the suggestion. I'm not sure how to use list splice of
dhchap_auth_list at renegotitation. But as for reset by nvme_auth_reset(),
your suggestion looks good for me to avoid the unbalanced lock. It can be
applied to tear down by nvme_auth_free() also. I created a patch [1] with
this approach. It just frees the dhchap contexts at reset and do not reuse
them for negotiation. I think this is fine since the contexts are re-allocated
at renegotiation.
> And we should prevent for the double-queuing of chap negotiations to
> begin with, instead of handling them (I still don't understand why this
> is permitted, but perhaps just return EBUSY in this case?)
Again, I'm not sure about this point.
I confirmed the patch [1] avoids the unbalanced lock. However, still the other
"WARNING: possible recursive locking detected" was reported. I created another
patch to add a workqueue dedicated to nvme auth, then it avoided the WARN. Are
these fix approches ok? If so, I can post the two patches for the two WARNs.
[1]
diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
index c8a6db7c4498..c71f1be7d403 100644
--- a/drivers/nvme/host/auth.c
+++ b/drivers/nvme/host/auth.c
@@ -920,17 +920,25 @@ int nvme_auth_wait(struct nvme_ctrl *ctrl, int qid)
}
EXPORT_SYMBOL_GPL(nvme_auth_wait);
-void nvme_auth_reset(struct nvme_ctrl *ctrl)
+static void nvme_auth_free_queue_contexts(struct nvme_ctrl *ctrl)
{
- struct nvme_dhchap_queue_context *chap;
+ struct nvme_dhchap_queue_context *chap = NULL, *tmp;
+ LIST_HEAD(splice);
mutex_lock(&ctrl->dhchap_auth_mutex);
- list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
- mutex_unlock(&ctrl->dhchap_auth_mutex);
+ list_splice_init(&ctrl->dhchap_auth_list, &splice);
+ mutex_unlock(&ctrl->dhchap_auth_mutex);
+
+ list_for_each_entry_safe(chap, tmp, &splice, entry) {
+ list_del_init(&chap->entry);
flush_work(&chap->auth_work);
- __nvme_auth_reset(chap);
+ __nvme_auth_free(chap);
}
- mutex_unlock(&ctrl->dhchap_auth_mutex);
+}
+
+void nvme_auth_reset(struct nvme_ctrl *ctrl)
+{
+ nvme_auth_free_queue_contexts(ctrl);
}
EXPORT_SYMBOL_GPL(nvme_auth_reset);
@@ -996,15 +1004,8 @@ EXPORT_SYMBOL_GPL(nvme_auth_stop);
void nvme_auth_free(struct nvme_ctrl *ctrl)
{
- struct nvme_dhchap_queue_context *chap = NULL, *tmp;
+ nvme_auth_free_queue_contexts(ctrl);
- mutex_lock(&ctrl->dhchap_auth_mutex);
- list_for_each_entry_safe(chap, tmp, &ctrl->dhchap_auth_list, entry) {
- list_del_init(&chap->entry);
- flush_work(&chap->auth_work);
- __nvme_auth_free(chap);
- }
- mutex_unlock(&ctrl->dhchap_auth_mutex);
if (ctrl->host_key) {
nvme_auth_free_key(ctrl->host_key);
ctrl->host_key = NULL;
--
Shin'ichiro Kawasaki
More information about the Linux-nvme
mailing list