bad unlock balance WARNING at nvme/045

Sagi Grimberg sagi at grimberg.me
Wed Oct 26 05:38:53 PDT 2022


>>>> Hello Hannes,
>>>>
>>>> I observed "WARNING: bad unlock balance detected!" at nvme/045 [1].
>>>> As the Call
>>>> Trace shows, nvme_auth_reset() has unbalanced mutex lock/unlock.
>>>>
>>>>      mutex_lock(&ctrl->dhchap_auth_mutex);
>>>>      list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
>>>>          mutex_unlock(&ctrl->dhchap_auth_mutex);
>>>>          flush_work(&chap->auth_work);
>>>>          __nvme_auth_reset(chap);
>>>>      }
>>>>      mutex_unlock(&ctrl->dhchap_auth_mutex);
>>>>
>>>> I tried to remove the mutex_unlock in the list iteration with a
>>>> patch [2], but
>>>> it resulted in another "WARNING: possible recursive locking
>>>> detected" [3]. I'm
>>>> not sure but cause of this WARN could be __nvme_auth_work and
>>>> nvme_dhchap_auth_work in same nvme_wq.
>>>>
>>>> Could you take a look for fix?
>>>
>>> I'm looking at the code and I think that the way the concurrent
>>> negotiations and how dhchap_auth_mutex is handled is very fragile,
>>> also why should the per-queue auth_work hold the controller-wide
>>> dhchap_auth_mutex? The only reason I see is because nvme_auth_negotiate
>>> is checking if the chap context is already queued? Why should we
>>> allow that?
>>>
>> Well; that's partially due to the internal design of linux-nvme.
>> The controller structure itself doesn't have 'queues' per se; there just is
>> a general 'ctrl' structure. So while I would have loved to have a per-queue
>> structure to hook the chap authentication into, all I have is the controller
>> structure.
>> Hence we have a controller-wide list holding all 'chap' structures for the
>> individual queues.
>> Hence the controller-wide mutex to gate list modifications.
>>
>>> I'd suggest to splice dhchap_auth_list, to a local list and then just
>>> flush nvmet_wq in teardown flows. Same for renegotiations/reset flows.
>>> And we should prevent for the double-queuing of chap negotiations to
>>> begin with, instead of handling them (I still don't understand why this
>>> is permitted, but perhaps just return EBUSY in this case?)
>>
>> We don't double queue; we're re-using the existing entries.
> 
> Hannes, thanks for the explanations.
> 
>>
>> Can you check if this fix works?
>>
>> diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c
>> index c8a6db7c4498..4e824aab30eb 100644
>> --- a/drivers/nvme/host/auth.c
>> +++ b/drivers/nvme/host/auth.c
>> @@ -926,7 +926,6 @@ void nvme_auth_reset(struct nvme_ctrl *ctrl)
>>
>>          mutex_lock(&ctrl->dhchap_auth_mutex);
>>          list_for_each_entry(chap, &ctrl->dhchap_auth_list, entry) {
>> -               mutex_unlock(&ctrl->dhchap_auth_mutex);
>>                  flush_work(&chap->auth_work);
>>                  __nvme_auth_reset(chap);
>>          }
> 
> I confirmed this hunk avoids the "WARNING: bad unlock balance detected!". As far
> as I ran blktests with this change, I observe no failure in other test cases.
> 
> However, I observed another new WARN at nvme/045: "WARNING: possible recursive
> locking detected". I think it was caused by nvme_dhchap_auth_work in nvme_wq
> tried to flush another work __nvme_auth_work in the same workqueue. I created a
> patch below which creates another workqueue nvme_auth_wq for __nvme_auth_work.
> Do you think this fix approach is acceptable?

It is fine to flush work on the same workqueue, the problem is that they
share the same lock.

I think what we should do is:
1. have chap contexts in an array and not a list so we don't need
to traverse to locate a context.
2. omit the dhchap_auth_mutex as I don't see a need to protect the
array.
3. synchronize only the chap context itself, either with a lock, or
a flag, or a state (atomic).
4. get rid of long lived allocations, it is not optimizing a whole lot
imho, and right now we are reserving something like 5k per queue.
5. add one more synchronization between chap and ctrl accessing the keys



More information about the Linux-nvme mailing list