[REPORT] possible circular locking dependency when booting a VM on arm64 host

Salil Mehta salil.mehta at huawei.com
Thu Jul 16 04:14:57 EDT 2020


> From: Salil Mehta
> Sent: Thursday, July 16, 2020 1:53 AM
> To: 'Marc Zyngier' <maz at kernel.org>; yuzenghui <yuzenghui at huawei.com>
> 
> > From: Marc Zyngier [mailto:maz at kernel.org]
> > Sent: Wednesday, July 15, 2020 5:09 PM
> > To: yuzenghui <yuzenghui at huawei.com>
> >
> > Hi Zenghui,
> >
> > On 2020-07-09 11:41, Zenghui Yu wrote:
> > > Hi All,
> > >
> > > I had seen the following lockdep splat when booting a guest on my
> > > Kunpeng 920 with GICv4 enabled. I can also trigger the same splat
> > > on v5.5 so it should already exist in the kernel for a while. I'm
> > > not sure what the exact problem is and hope someone can have a look!
> >
> > I can't manage to trigger this splat on my D05, despite running guests
> > with GICv4 enabled. A couple of questions below:
> 
> 
> Sorry I forgot to update but I did try on Friday and I could not manage
> to trigger it on D06/Kunpeng920 either. I used 5.8.0-rc4.
> 
> 
> > > Thanks,
> > > Zenghui
> > >
> > > [  103.855511] ======================================================
> > > [  103.861664] WARNING: possible circular locking dependency detected
> > > [  103.867817] 5.8.0-rc4+ #35 Tainted: G        W
> > > [  103.872932] ------------------------------------------------------
> > > [  103.879083] CPU 2/KVM/20515 is trying to acquire lock:
> > > [  103.884200] ffff202fcd5865b0 (&irq_desc_lock_class){-.-.}-{2:2},
> > > at: __irq_get_desc_lock+0x60/0xa0
> > > [  103.893127]
> > >                but task is already holding lock:
> > > [  103.898933] ffff202fcfd07f58 (&rq->lock){-.-.}-{2:2}, at:
> > > __schedule+0x114/0x8b8
> > > [  103.906301]
> > >                which lock already depends on the new lock.
> > >
> > > [  103.914441]
> > >                the existing dependency chain (in reverse order) is:
> > > [  103.921888]
> > >                -> #3 (&rq->lock){-.-.}-{2:2}:
> > > [  103.927438]        _raw_spin_lock+0x54/0x70
> > > [  103.931605]        task_fork_fair+0x48/0x150
> > > [  103.935860]        sched_fork+0x100/0x268
> > > [  103.939856]        copy_process+0x628/0x1868
> > > [  103.944106]        _do_fork+0x74/0x710
> > > [  103.947840]        kernel_thread+0x78/0xa0
> > > [  103.951917]        rest_init+0x30/0x270
> > > [  103.955742]        arch_call_rest_init+0x14/0x1c
> > > [  103.960339]        start_kernel+0x534/0x568
> > > [  103.964503]
> > >                -> #2 (&p->pi_lock){-.-.}-{2:2}:
> > > [  103.970224]        _raw_spin_lock_irqsave+0x70/0x98
> > > [  103.975080]        try_to_wake_up+0x5c/0x5b0
> > > [  103.979330]        wake_up_process+0x28/0x38
> > > [  103.983581]        create_worker+0x128/0x1b8
> > > [  103.987834]        workqueue_init+0x308/0x3bc
> > > [  103.992172]        kernel_init_freeable+0x180/0x33c
> > > [  103.997027]        kernel_init+0x18/0x118
> > > [  104.001020]        ret_from_fork+0x10/0x18
> > > [  104.005097]
> > >                -> #1 (&pool->lock){-.-.}-{2:2}:
> > > [  104.010817]        _raw_spin_lock+0x54/0x70
> > > [  104.014983]        __queue_work+0x120/0x6e8
> > > [  104.019146]        queue_work_on+0xa0/0xd8
> > > [  104.023225]        irq_set_affinity_locked+0xa8/0x178
> > > [  104.028253]        __irq_set_affinity+0x5c/0x90
> > > [  104.032762]        irq_set_affinity_hint+0x74/0xb0
> > > [  104.037540]        hns3_nic_init_irq+0xe0/0x210 [hns3]
> > > [  104.042655]        hns3_client_init+0x2d8/0x4e0 [hns3]
> > > [  104.047779]        hclge_init_client_instance+0xf0/0x3a8 [hclge]
> > > [  104.053760]        hnae3_init_client_instance.part.3+0x30/0x68
> > > [hnae3]
> > > [  104.060257]        hnae3_register_ae_dev+0x100/0x1f0 [hnae3]
> > > [  104.065892]        hns3_probe+0x60/0xa8 [hns3]
> >
> > Are you performing some kind of PCIe hot-plug here? Or is that done
> > at boot only? It seems to help triggering the splat.
> 
> 
> I am not sure how you can do that since HNS3 is integrated NIC so
> physical hot-plug is definitely ruled out. local_pci_probe()
> should also get called when we insert the hns3_enet module which
> eventually initializes the driver.

Or perhaps you meant below?

echo 1 > /sys/bus/pci/devices/xxxx/xx.x/remove
echo 1 > /sys/bus/pci/devices/rescan

Above is not being used I did confirm this with Zenghui earlier.

 




More information about the linux-arm-kernel mailing list