[bug report] WARNING: possible circular locking at: rdma_destroy_id+0x17/0x20 [rdma_cm] triggered by blktests nvmeof-mp/002

Fri Aug 26 03:03:18 PDT 2022

On 2022/8/25 14:26, Guoqing Jiang wrote:
> 
> 
> On 8/25/22 1:59 PM, yangx.jy at fujitsu.com wrote:
>> On 2022/5/25 19:01, Sagi Grimberg wrote:
>>> iirc this was reported before, based on my analysis lockdep is giving
>>> a false alarm here. The reason is that the id_priv->handler_mutex cannot
>>> be the same for both cm_id that is handling the connect and the cm_id
>>> that is handling the rdma_destroy_id because rdma_destroy_id call
>>> is always called on a already disconnected cm_id, so this deadlock
>>> lockdep is complaining about cannot happen.
>> Hi Jason, Bart and Sagi,
>>
>> I also think it is actually a false positive.  The cm_id handling the
>> connection and the cm_id calling rdma_destroy_id() cannot be the same
>> one, right?
> 
> I am wondering if it is the same as the thread.
> 
> https://lore.kernel.org/linux-rdma/CAMGffEm22sP-oKK0D9=vOw77nbS05iwG7MC3DTVB0CyzVFhtXg@mail.gmail.com/ 

Hi Guoqing,

Thanks for your feedback.

I think they are the same deadlock issue (i.e. AB vs BCA).  The only 
difference is that two combinations of locks caused the same issue.

It seems that one id_priv->handler_mutex is locked on the new-created 
cm_id and the other id_priv->handler_mutex is locked on the disconnected 
cm_id.

> 
> 
>>> I'm not sure how to settle this.
>> Do you have any suggestion to remove the false positive by refactoring
>> the related RDMA/CM code. Sorry, I didn't know how to do it for now.
> 
> The simplest way is to call lockdep_off in case it is false alarm to 
> avoid the
> debugging effort, but not everyone likes the idea.
> 
> https://elixir.bootlin.com/linux/v6.0-rc2/C/ident/lockdep_off

To be honest, I don't like the fix way as well. I wonder if we can avoid 
the false positive by changing the related RDMA/CM code.

Best Regards,
Xiao Yang

> 
> Thanks,
> Guoqing