[bug report] WARNING: possible circular locking at: rdma_destroy_id+0x17/0x20 [rdma_cm] triggered by blktests nvmeof-mp/002

Guoqing Jiang guoqing.jiang at linux.dev
Fri Aug 26 04:32:22 PDT 2022



On 8/26/22 6:03 PM, yangx.jy at fujitsu.com wrote:
> On 2022/8/25 14:26, Guoqing Jiang wrote:
>>
>> On 8/25/22 1:59 PM, yangx.jy at fujitsu.com wrote:
>>> On 2022/5/25 19:01, Sagi Grimberg wrote:
>>>> iirc this was reported before, based on my analysis lockdep is giving
>>>> a false alarm here. The reason is that the id_priv->handler_mutex cannot
>>>> be the same for both cm_id that is handling the connect and the cm_id
>>>> that is handling the rdma_destroy_id because rdma_destroy_id call
>>>> is always called on a already disconnected cm_id, so this deadlock
>>>> lockdep is complaining about cannot happen.
>>> Hi Jason, Bart and Sagi,
>>>
>>> I also think it is actually a false positive.  The cm_id handling the
>>> connection and the cm_id calling rdma_destroy_id() cannot be the same
>>> one, right?
>> I am wondering if it is the same as the thread.
>>
>> https://lore.kernel.org/linux-rdma/CAMGffEm22sP-oKK0D9=vOw77nbS05iwG7MC3DTVB0CyzVFhtXg@mail.gmail.com/
> Hi Guoqing,
>
> Thanks for your feedback.
>
> I think they are the same deadlock issue (i.e. AB vs BCA).  The only
> difference is that two combinations of locks caused the same issue.
>
> It seems that one id_priv->handler_mutex is locked on the new-created
> cm_id and the other id_priv->handler_mutex is locked on the disconnected
> cm_id.
>
>>>> I'm not sure how to settle this.
>>> Do you have any suggestion to remove the false positive by refactoring
>>> the related RDMA/CM code. Sorry, I didn't know how to do it for now.
>> The simplest way is to call lockdep_off in case it is false alarm to
>> avoid the
>> debugging effort, but not everyone likes the idea.
>>
>> https://elixir.bootlin.com/linux/v6.0-rc2/C/ident/lockdep_off
> To be honest, I don't like the fix way as well. I wonder if we can avoid
> the false positive by changing the related RDMA/CM code.

I would consider it is a workaround before CM code is changed (and it needs
more effort I guess hopefully I am wrong), otherwise different people would
post the similar issue to list again.

Thanks,
Guoqing



More information about the Linux-nvme mailing list