bad unlock balance WARNING at nvme/045
Sagi Grimberg
sagi at grimberg.me
Tue Oct 18 03:57:41 PDT 2022
> Hello Hannes,
>
> I observed "WARNING: bad unlock balance detected!" at nvme/045 [1]. As the Call
> Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.
>
> mutex_lock(&ctrl->dhchap_auth_mutex);
> list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
> mutex_unlock(&ctrl->dhchap_auth_mutex);
> flush_work(&chap->auth_work);
> __nvme_auth_reset(chap);
> }
> mutex_unlock(&ctrl->dhchap_auth_mutex);
>
> I tried to remove the mutex_unlock in the list iteration with a patch [2], but
> it resulted in another "WARNING: possible recursive locking detected" [3]. I'm
> not sure but cause of this WARN could be __nvme_auth_work and
> nvme_dhchap_auth_work in same nvme_wq.
>
> Could you take a look for fix?
I'm looking at the code and I think that the way the concurrent
negotiations and how dhchap_auth_mutex is handled is very fragile,
also why should the per-queue auth_work hold the controller-wide
dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate
is checking if the chap context is already queued? Why should we
allow that?
I'd suggest to splice dhchap_auth_list, to a local list and then just
flush nvmet_wq in teardown flows. Same for renegotiations/reset flows.
And we should prevent for the double-queuing of chap negotiations to
begin with, instead of handling them (I still don't understand why this
is permitted, but perhaps just return EBUSY in this case?)
More information about the Linux-nvme
mailing list