PROBLEM: mtk_pcie_suspend_noirq sleep hang breaking suspend on MT8192 Asurada Spherion

Leonard Lausen leonard at lausen.nl
Sun Feb 11 17:34:16 PST 2024


Dear Angelo, Dear Maintainers,

thank you for your help supporting MT8192 platform. Unfortunataly, the issue with Asurada Spherion discussed in this thread persists with v6.8-rc4. I'm including an updated dmesg snippet below. Please let me know if I can provide any further information to help debug this issue.

Thank you
Leonard

[  688.910709] PM: suspend entry (deep)
[  688.978833] Filesystems sync: 0.064 seconds
[  688.987422] Freezing user space processes
[  688.999288] Freezing user space processes completed (elapsed 0.007 seconds)
[  689.006391] OOM killer disabled.
[  689.009632] Freezing remaining freezable tasks
[  689.016980] Freezing remaining freezable tasks completed (elapsed 0.002 seconds)
[  689.024376] printk: Suspending console(s) (use no_console_suspend to debug)
[  689.181542] queueing ieee80211 work while going to suspend

[  689.301195] ========================================================
[  689.301197] WARNING: possible irq lock inversion dependency detected
[  689.301199] 6.8.0-rc4-cos-mt9 #1 Not tainted
[  689.301203] --------------------------------------------------------
[  689.301204] systemd-sleep/4626 just changed the state of lock:
[  689.301208] ffff610985759560 (&pcie->irq_lock){+...}-{2:2}, at: mtk_pcie_suspend_noirq+0xd0/0x14c
[  689.301228] but this lock was taken by another, HARDIRQ-safe lock in the past:
[  689.301230]  (&irq_desc_lock_class){-.-.}-{2:2}
[  689.301234] 
               
               and interrupts could create inverse lock ordering between them.

[  689.301236] 
               other info that might help us debug this:
[  689.301238]  Possible interrupt unsafe locking scenario:

[  689.301240]        CPU0                    CPU1
[  689.301241]        ----                    ----
[  689.301242]   lock(&pcie->irq_lock);
[  689.301246]                                local_irq_disable();
[  689.301247]                                lock(&irq_desc_lock_class);
[  689.301251]                                lock(&pcie->irq_lock);
[  689.301254]   <Interrupt>
[  689.301255]     lock(&irq_desc_lock_class);
[  689.301258] 
                *** DEADLOCK ***

[  689.301259] 4 locks held by systemd-sleep/4626:
[  689.301262]  #0: ffff610987c133f8 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0x2ac/0x340
[  689.301278]  #1: ffff610a2d9a4688 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf0/0x1b0
[  689.301289]  #2: ffff610980b74158 (kn->active#115){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf8/0x1b0
[  689.301300]  #3: ffffd2e4d5792cc0 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x98/0x330
[  689.301313] 
               the shortest dependencies between 2nd lock and 1st lock:
[  689.301315]  -> (&irq_desc_lock_class){-.-.}-{2:2} {
[  689.301321]     IN-HARDIRQ-W at:
[  689.301324]                       lock_acquire+0x11c/0x324
[  689.301329]                       _raw_spin_lock+0x48/0x60
[  689.301337]                       handle_fasteoi_irq+0x2c/0x23c
[  689.301342]                       generic_handle_domain_irq+0x2c/0x44
[  689.301348]                       gic_handle_irq+0x16c/0x29c
[  689.301352]                       call_on_irq_stack+0x24/0x4c
[  689.301357]                       do_interrupt_handler+0x80/0x84
[  689.301362]                       el1_interrupt+0x44/0x98
[  689.301368]                       el1h_64_irq_handler+0x18/0x24
[  689.301374]                       el1h_64_irq+0x78/0x7c
[  689.301377]                       default_idle_call+0xa0/0x14c
[  689.301381]                       do_idle+0x270/0x2d0
[  689.301387]                       cpu_startup_entry+0x38/0x3c
[  689.301392]                       rest_init+0x118/0x1a8
[  689.301395]                       arch_post_acpi_subsys_init+0x0/0x8
[  689.301403]                       start_kernel+0x5b8/0x6ac
[  689.301407]                       __primary_switched+0xb8/0xc0
[  689.301411]     IN-SOFTIRQ-W at:
[  689.301414]                       lock_acquire+0x11c/0x324
[  689.301418]                       _raw_spin_lock+0x48/0x60
[  689.301423]                       handle_fasteoi_irq+0x2c/0x23c
[  689.301427]                       generic_handle_domain_irq+0x2c/0x44
[  689.301432]                       gic_handle_irq+0x16c/0x29c
[  689.301435]                       do_interrupt_handler+0x50/0x84
[  689.301439]                       el1_interrupt+0x44/0x98
[  689.301445]                       el1h_64_irq_handler+0x18/0x24
[  689.301451]                       el1h_64_irq+0x78/0x7c
[  689.301454]                       _nohz_idle_balance.isra.0+0xd4/0x388
[  689.301458]                       run_rebalance_domains+0x64/0x74
[  689.301462]                       __do_softirq+0x148/0x518
[  689.301465]                       ____do_softirq+0x10/0x1c
[  689.301469]                       call_on_irq_stack+0x24/0x4c
[  689.301473]                       do_softirq_own_stack+0x1c/0x2c
[  689.301477]                       __irq_exit_rcu+0x12c/0x148
[  689.301484]                       irq_exit_rcu+0x10/0x38
[  689.301489]                       el1_interrupt+0x48/0x98
[  689.301495]                       el1h_64_irq_handler+0x18/0x24
[  689.301500]                       el1h_64_irq+0x78/0x7c
[  689.301503]                       default_idle_call+0xa0/0x14c
[  689.301507]                       do_idle+0x270/0x2d0
[  689.301511]                       cpu_startup_entry+0x38/0x3c
[  689.301517]                       rest_init+0x118/0x1a8
[  689.301520]                       arch_post_acpi_subsys_init+0x0/0x8
[  689.301525]                       start_kernel+0x5b8/0x6ac
[  689.301529]                       __primary_switched+0xb8/0xc0
[  689.301533]     INITIAL USE at:
[  689.301536]                      lock_acquire+0x11c/0x324
[  689.301540]                      _raw_spin_lock_irqsave+0x68/0xbc
[  689.301545]                      __irq_get_desc_lock+0x58/0x98
[  689.301551]                      irq_modify_status+0x38/0x140
[  689.301555]                      irq_set_percpu_devid+0x70/0x94
[  689.301561]                      gic_irq_domain_alloc+0x208/0x248
[  689.301568]                      irq_domain_alloc_irqs_locked+0xf8/0x358
[  689.301573]                      __irq_domain_alloc_irqs+0x6c/0xc0
[  689.301578]                      gic_init_bases+0x358/0x6a4
[  689.301585]                      gic_of_init+0x290/0x31c
[  689.301589]                      of_irq_init+0x304/0x390
[  689.301596]                      irqchip_init+0x18/0x24
[  689.301601]                      init_IRQ+0xb4/0x12c
[  689.301606]                      start_kernel+0x26c/0x6ac
[  689.301610]                      __primary_switched+0xb8/0xc0
[  689.301614]   }
[  689.301615]   ... key      at: [<ffffd2e4d65b8bc0>] irq_desc_lock_class+0x0/0x10
[  689.301621]   ... acquired at:
[  689.301623]    _raw_spin_lock_irqsave+0x68/0xbc
[  689.301629]    mtk_msi_bottom_irq_unmask+0x30/0x70
[  689.301633]    irq_chip_unmask_parent+0x1c/0x28
[  689.301637]    mtk_pcie_msi_irq_unmask+0x20/0x30
[  689.301642]    irq_enable+0x40/0x8c
[  689.301645]    __irq_startup+0x78/0xa4
[  689.301649]    irq_startup+0xfc/0x164
[  689.301653]    __setup_irq+0x3f0/0x6bc
[  689.301656]    request_threaded_irq+0xec/0x1a4
[  689.301659]    pcie_pme_probe+0xf4/0x1bc
[  689.301666]    pcie_port_probe_service+0x38/0x64
[  689.301670]    really_probe+0x148/0x2ac
[  689.301676]    __driver_probe_device+0x78/0x12c
[  689.301680]    driver_probe_device+0x3c/0x160
[  689.301683]    __device_attach_driver+0xb8/0x138
[  689.301687]    bus_for_each_drv+0x80/0xdc
[  689.301691]    __device_attach+0x9c/0x188
[  689.301695]    device_initial_probe+0x14/0x20
[  689.301699]    bus_probe_device+0xac/0xb0
[  689.301702]    device_add+0x5ac/0x768
[  689.301708]    device_register+0x20/0x30
[  689.301714]    pcie_portdrv_probe+0x340/0x5c4
[  689.301718]    local_pci_probe+0x40/0xa4
[  689.301722]    pci_device_probe+0xac/0x1f0
[  689.301727]    really_probe+0x148/0x2ac
[  689.301730]    __driver_probe_device+0x78/0x12c
[  689.301734]    driver_probe_device+0x3c/0x160
[  689.301738]    __device_attach_driver+0xb8/0x138
[  689.301742]    bus_for_each_drv+0x80/0xdc
[  689.301745]    __device_attach+0x9c/0x188
[  689.301748]    device_attach+0x14/0x20
[  689.301752]    pci_bus_add_device+0x64/0xd4
[  689.301756]    pci_bus_add_devices+0x3c/0x88
[  689.301760]    pci_host_probe+0x44/0xbc
[  689.301764]    mtk_pcie_probe+0x288/0x47c
[  689.301768]    platform_probe+0x68/0xc0
[  689.301773]    really_probe+0x148/0x2ac
[  689.301777]    __driver_probe_device+0x78/0x12c
[  689.301781]    driver_probe_device+0x3c/0x160
[  689.301784]    __device_attach_driver+0xb8/0x138
[  689.301788]    bus_for_each_drv+0x80/0xdc
[  689.301791]    __device_attach+0x9c/0x188
[  689.301795]    device_initial_probe+0x14/0x20
[  689.301799]    bus_probe_device+0xac/0xb0
[  689.301802]    deferred_probe_work_func+0x8c/0xc8
[  689.301806]    process_one_work+0x1ec/0x4dc
[  689.301812]    worker_thread+0x1dc/0x3d4
[  689.301817]    kthread+0xfc/0x100
[  689.301821]    ret_from_fork+0x10/0x20

[  689.301826] -> (&pcie->irq_lock){+...}-{2:2} {
[  689.301832]    HARDIRQ-ON-W at:
[  689.301835]                     lock_acquire+0x11c/0x324
[  689.301839]                     _raw_spin_lock+0x48/0x60
[  689.301844]                     mtk_pcie_suspend_noirq+0xd0/0x14c
[  689.301849]                     dpm_run_callback+0x34/0x9c
[  689.301854]                     __device_suspend_noirq+0x68/0x1f0
[  689.301859]                     dpm_noirq_suspend_devices+0x1d8/0x274
[  689.301864]                     dpm_suspend_noirq+0x24/0x98
[  689.301868]                     suspend_devices_and_enter+0x3b0/0x65c
[  689.301874]                     pm_suspend+0x1fc/0x330
[  689.301880]                     state_store+0x80/0xec
[  689.301885]                     kobj_attr_store+0x18/0x2c
[  689.301889]                     sysfs_kf_write+0x4c/0x78
[  689.301893]                     kernfs_fop_write_iter+0x120/0x1b0
[  689.301896]                     vfs_write+0x224/0x340
[  689.301902]                     ksys_write+0x6c/0x100
[  689.301908]                     __arm64_sys_write+0x1c/0x28
[  689.301914]                     invoke_syscall+0x48/0x114
[  689.301920]                     el0_svc_common.constprop.0+0x40/0xe0
[  689.301926]                     do_el0_svc+0x1c/0x28
[  689.301932]                     el0_svc+0x50/0x108
[  689.301937]                     el0t_64_sync_handler+0x100/0x12c
[  689.301943]                     el0t_64_sync+0x1a4/0x1a8
[  689.301946]    INITIAL USE at:
[  689.301948]                    lock_acquire+0x11c/0x324
[  689.301952]                    _raw_spin_lock_irqsave+0x68/0xbc
[  689.301957]                    mtk_msi_bottom_irq_unmask+0x30/0x70
[  689.301962]                    irq_chip_unmask_parent+0x1c/0x28
[  689.301966]                    mtk_pcie_msi_irq_unmask+0x20/0x30
[  689.301970]                    irq_enable+0x40/0x8c
[  689.301974]                    __irq_startup+0x78/0xa4
[  689.301978]                    irq_startup+0xfc/0x164
[  689.301982]                    __setup_irq+0x3f0/0x6bc
[  689.301984]                    request_threaded_irq+0xec/0x1a4
[  689.301987]                    pcie_pme_probe+0xf4/0x1bc
[  689.301993]                    pcie_port_probe_service+0x38/0x64
[  689.301997]                    really_probe+0x148/0x2ac
[  689.302001]                    __driver_probe_device+0x78/0x12c
[  689.302004]                    driver_probe_device+0x3c/0x160
[  689.302008]                    __device_attach_driver+0xb8/0x138
[  689.302012]                    bus_for_each_drv+0x80/0xdc
[  689.302015]                    __device_attach+0x9c/0x188
[  689.302018]                    device_initial_probe+0x14/0x20
[  689.302022]                    bus_probe_device+0xac/0xb0
[  689.302026]                    device_add+0x5ac/0x768
[  689.302031]                    device_register+0x20/0x30
[  689.302037]                    pcie_portdrv_probe+0x340/0x5c4
[  689.302041]                    local_pci_probe+0x40/0xa4
[  689.302044]                    pci_device_probe+0xac/0x1f0
[  689.302048]                    really_probe+0x148/0x2ac
[  689.302052]                    __driver_probe_device+0x78/0x12c
[  689.302056]                    driver_probe_device+0x3c/0x160
[  689.302059]                    __device_attach_driver+0xb8/0x138
[  689.302063]                    bus_for_each_drv+0x80/0xdc
[  689.302066]                    __device_attach+0x9c/0x188
[  689.302070]                    device_attach+0x14/0x20
[  689.302074]                    pci_bus_add_device+0x64/0xd4
[  689.302077]                    pci_bus_add_devices+0x3c/0x88
[  689.302080]                    pci_host_probe+0x44/0xbc
[  689.302083]                    mtk_pcie_probe+0x288/0x47c
[  689.302088]                    platform_probe+0x68/0xc0
[  689.302093]                    really_probe+0x148/0x2ac
[  689.302096]                    __driver_probe_device+0x78/0x12c
[  689.302100]                    driver_probe_device+0x3c/0x160
[  689.302104]                    __device_attach_driver+0xb8/0x138
[  689.302108]                    bus_for_each_drv+0x80/0xdc
[  689.302111]                    __device_attach+0x9c/0x188
[  689.302115]                    device_initial_probe+0x14/0x20
[  689.302119]                    bus_probe_device+0xac/0xb0
[  689.302122]                    deferred_probe_work_func+0x8c/0xc8
[  689.302125]                    process_one_work+0x1ec/0x4dc
[  689.302130]                    worker_thread+0x1dc/0x3d4
[  689.302135]                    kthread+0xfc/0x100
[  689.302139]                    ret_from_fork+0x10/0x20
[  689.302143]  }
[  689.302144]  ... key      at: [<ffffd2e4d65f8690>] __key.1+0x0/0x10
[  689.302150]  ... acquired at:
[  689.302151]    __lock_acquire+0x374/0x1a10
[  689.302155]    lock_acquire+0x11c/0x324
[  689.302159]    _raw_spin_lock+0x48/0x60
[  689.302163]    mtk_pcie_suspend_noirq+0xd0/0x14c
[  689.302168]    dpm_run_callback+0x34/0x9c
[  689.302172]    __device_suspend_noirq+0x68/0x1f0
[  689.302177]    dpm_noirq_suspend_devices+0x1d8/0x274
[  689.302181]    dpm_suspend_noirq+0x24/0x98
[  689.302186]    suspend_devices_and_enter+0x3b0/0x65c
[  689.302191]    pm_suspend+0x1fc/0x330
[  689.302197]    state_store+0x80/0xec
[  689.302202]    kobj_attr_store+0x18/0x2c
[  689.302205]    sysfs_kf_write+0x4c/0x78
[  689.302209]    kernfs_fop_write_iter+0x120/0x1b0
[  689.302213]    vfs_write+0x224/0x340
[  689.302218]    ksys_write+0x6c/0x100
[  689.302224]    __arm64_sys_write+0x1c/0x28
[  689.302230]    invoke_syscall+0x48/0x114
[  689.302235]    el0_svc_common.constprop.0+0x40/0xe0
[  689.302241]    do_el0_svc+0x1c/0x28
[  689.302246]    el0_svc+0x50/0x108
[  689.302252]    el0t_64_sync_handler+0x100/0x12c
[  689.302257]    el0t_64_sync+0x1a4/0x1a8

[  689.302261] 
               stack backtrace:
[  689.302264] CPU: 4 PID: 4626 Comm: systemd-sleep Not tainted 6.8.0-rc4-cos-mt9 #1
[  689.302268] Hardware name: Google Spherion (rev0 - 3) (DT)
[  689.302271] Call trace:
[  689.302272]  dump_backtrace+0x98/0xf0
[  689.302276]  show_stack+0x18/0x24
[  689.302279]  dump_stack_lvl+0xb8/0x118
[  689.302285]  dump_stack+0x18/0x24
[  689.302290]  print_irq_inversion_bug.part.0+0x1ec/0x27c
[  689.302294]  mark_lock+0x300/0x720
[  689.302298]  __lock_acquire+0x374/0x1a10
[  689.302301]  lock_acquire+0x11c/0x324
[  689.302305]  _raw_spin_lock+0x48/0x60
[  689.302310]  mtk_pcie_suspend_noirq+0xd0/0x14c
[  689.302315]  dpm_run_callback+0x34/0x9c
[  689.302319]  __device_suspend_noirq+0x68/0x1f0
[  689.302324]  dpm_noirq_suspend_devices+0x1d8/0x274
[  689.302328]  dpm_suspend_noirq+0x24/0x98
[  689.302332]  suspend_devices_and_enter+0x3b0/0x65c
[  689.302338]  pm_suspend+0x1fc/0x330
[  689.302343]  state_store+0x80/0xec
[  689.302348]  kobj_attr_store+0x18/0x2c
[  689.302352]  sysfs_kf_write+0x4c/0x78
[  689.302355]  kernfs_fop_write_iter+0x120/0x1b0
[  689.302359]  vfs_write+0x224/0x340
[  689.302364]  ksys_write+0x6c/0x100
[  689.302370]  __arm64_sys_write+0x1c/0x28
[  689.302376]  invoke_syscall+0x48/0x114
[  689.302381]  el0_svc_common.constprop.0+0x40/0xe0
[  689.302387]  do_el0_svc+0x1c/0x28
[  689.302392]  el0_svc+0x50/0x108
[  689.302398]  el0t_64_sync_handler+0x100/0x12c
[  689.302403]  el0t_64_sync+0x1a4/0x1a8
[  689.305522] Disabling non-boot CPUs ...
[  689.309813] IRQ273: set affinity failed(-22).
[  689.309868] IRQ288: set affinity failed(-22).
[  689.309993] psci: CPU1 killed (polled 0 ms)
[  689.314674] IRQ273: set affinity failed(-22).
[  689.314725] IRQ288: set affinity failed(-22).
[  689.314832] psci: CPU2 killed (polled 0 ms)
[  689.318738] IRQ273: set affinity failed(-22).
[  689.318781] IRQ288: set affinity failed(-22).
[  689.318904] psci: CPU3 killed (polled 0 ms)
[  689.320721] IRQ273: set affinity failed(-22).
[  689.320733] IRQ288: set affinity failed(-22).
[  689.320809] psci: CPU4 killed (polled 0 ms)
[  689.322718] IRQ273: set affinity failed(-22).
[  689.322738] IRQ288: set affinity failed(-22).
[  689.322852] psci: CPU5 killed (polled 0 ms)
[  689.326646] psci: CPU6 killed (polled 0 ms)
[  689.328620] psci: CPU7 killed (polled 0 ms)
[  689.331564] Enabling non-boot CPUs ...
[  689.332951] Detected VIPT I-cache on CPU1
[  689.333012] GICv3: CPU1: found redistributor 100 region 0:0x000000000c060000
[  689.333079] CPU1: Booted secondary processor 0x0000000100 [0x412fd050]
[  689.335992] CPU1 is up
[  689.336956] Detected VIPT I-cache on CPU2
[  689.337006] GICv3: CPU2: found redistributor 200 region 0:0x000000000c080000
[  689.337063] CPU2: Booted secondary processor 0x0000000200 [0x412fd050]
[  689.339991] CPU2 is up
[  689.341081] Detected VIPT I-cache on CPU3
[  689.341127] GICv3: CPU3: found redistributor 300 region 0:0x000000000c0a0000
[  689.341179] CPU3: Booted secondary processor 0x0000000300 [0x412fd050]
[  689.344328] CPU3 is up
[  689.345125] Detected PIPT I-cache on CPU4
[  689.345146] GICv3: CPU4: found redistributor 400 region 0:0x000000000c0c0000
[  689.345167] CPU4: Booted secondary processor 0x0000000400 [0x414fd0b0]
[  689.346438] CPU4 is up
[  689.347212] Detected PIPT I-cache on CPU5
[  689.347231] GICv3: CPU5: found redistributor 500 region 0:0x000000000c0e0000
[  689.347251] CPU5: Booted secondary processor 0x0000000500 [0x414fd0b0]
[  689.348409] CPU5 is up
[  689.349783] Detected PIPT I-cache on CPU6
[  689.349803] GICv3: CPU6: found redistributor 600 region 0:0x000000000c100000
[  689.349822] CPU6: Booted secondary processor 0x0000000600 [0x414fd0b0]
[  689.351051] CPU6 is up
[  689.351810] Detected PIPT I-cache on CPU7
[  689.351830] GICv3: CPU7: found redistributor 700 region 0:0x000000000c120000
[  689.351849] CPU7: Booted secondary processor 0x0000000700 [0x414fd0b0]
[  689.353144] CPU7 is up
[  691.492453] OOM killer enabled.
[  691.495588] Restarting tasks ... done.
[  691.501402] random: crng reseeded on system resumption
[  691.504932] PM: suspend exit


September 20, 2023 at 4:22 AM, "AngeloGioacchino Del Regno" <angelogioacchino.delregno at collabora.com> wrote: 
> Il 20/09/23 03:28, Leonard Lausen ha scritto:
> 
> > 
> > Dear AngeloGioacchino, Dear Maintainers,
> > 
> >  on MT8192 Asurada Spherion (Acer 514), I observe the following sleep hang, causing > a failure to suspend the system. The hang looks related to the deadlock you fixed > for MT8195 Tomato Chromebook in the past, thus I'm including you in the To: line. > The sleep hang happens with both v6.5.4 as well as v6.5.4 with > tags/mediatek-drm-next-6.6 merged in. (I'm unable to validate v6.6-rc2 currently, > as there's a regression breaking boot.) Please let me know if I can provide any > additional information to help debug this issue or if you have any ideas how to > address this bug.
> > 
> 
> Looks like the issue is not related to the PCIe driver, but to something else not
> 
> going to sleep (and the ChromeOS EC detecting that, and preventing the kernel to
> 
> believe that the platform actually went to sleep).
> 
> I'll try to check what's going on with MT8192 sleep ASAP.
> 
> Anyway, as for the v6.6-rc2 regression, that's happening on MT8195 as well.. and
> 
> I have solved it with this commit:
> 
> https://lore.kernel.org/lkml/20230918150043.403250-1-angelogioacchino.delregno@collabora.com/
> 
> Regards,
> 
> Angelo



More information about the Linux-mediatek mailing list