[PATCH] mailbox: mtk-cmdq: Fix sleeping function called from invalid context

Jassi Brar jassisinghbrar at gmail.com
Tue Apr 30 17:57:54 PDT 2024


On Tue, Apr 30, 2024 at 4:39 AM AngeloGioacchino Del Regno
<angelogioacchino.delregno at collabora.com> wrote:
>
> Il 30/04/24 10:39, Jason-JH.Lin ha scritto:
> > When we run kernel with lockdebug option, we will get the BUG below:
> > [  106.692124] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1164
> > [  106.692190] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 3616, name: kworker/u17:3
> > [  106.692226] preempt_count: 1, expected: 0
> > [  106.692254] RCU nest depth: 0, expected: 0
> > [  106.692282] INFO: lockdep is turned off.
> > [  106.692306] irq event stamp: 0
> > [  106.692331] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> > [  106.692376] hardirqs last disabled at (0): [<ffffffee15d37fa0>] copy_process+0xc90/0x2ac0
> > [  106.692429] softirqs last  enabled at (0): [<ffffffee15d37fc4>] copy_process+0xcb4/0x2ac0
> > [  106.692473] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [  106.692513] CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted 6.1.87-lockdep-14133-g26e933aca785 #1 6839942e1cf34914b0a366137843dd2366f52aa9
> > [  106.692556] Hardware name: Google Ciri sku0/unprovisioned board (DT)
> > [  106.692586] Workqueue: imgsys_runner imgsys_runner_func
> > [  106.692638] Call trace:
> > [  106.692662]  dump_backtrace+0x100/0x120
> > [  106.692702]  show_stack+0x20/0x2c
> > [  106.692737]  dump_stack_lvl+0x84/0xb4
> > [  106.692775]  dump_stack+0x18/0x48
> > [  106.692809]  __might_resched+0x354/0x4c0
> > [  106.692847]  __might_sleep+0x98/0xe4
> > [  106.692883]  __pm_runtime_resume+0x70/0x124
> > [  106.692921]  cmdq_mbox_send_data+0xe4/0xb1c
> > [  106.692964]  msg_submit+0x194/0x2dc
> > [  106.693003]  mbox_send_message+0x190/0x330
> > [  106.693043]  imgsys_cmdq_sendtask+0x1618/0x2224
> > [  106.693082]  imgsys_runner_func+0xac/0x11c
> > [  106.693118]  process_one_work+0x638/0xf84
> > [  106.693158]  worker_thread+0x808/0xcd0
> > [  106.693196]  kthread+0x24c/0x324
> > [  106.693231]  ret_from_fork+0x10/0x20
> >
> > We found that there is a spin_lock_irqsave protection in msg_submit()
> > of mailbox.c.
> > So when cmdq driver calls pm_runtime_get_sync() in cmdq_mbox_send_data(),
> > it will get this BUG report.
> >
> > Add pm_runtime_irq_safe() to let pm_runtime callbacks is safe in atomic
> > context with interrupts disabled.
> >
>
> I see. The problem with this is that pm_runtime_irq_safe() will raise the refcount
> of the parent device "forever", which isn't the best and partially defeats what we
> are trying to do here (keeping stuff off unless really needed).
>
> At this point I wonder if it'd be just better to really fix this instead of working
> around the problem.
>
> I'd say that one of the options would be to perform a change to msg_submit() so
> that it will take into account that a .send_data() callback might be using RPM
> functions.
>
> Ideas? :-)
>
@send_data:  The API asks the MBOX controller driver, in atomic
 *      context try to transmit a message on the bus. Returns 0 if
 *      data is accepted for transmission, -EBUSY while rejecting
 *      if the remote hasn't yet read the last data sent. Actual
 *      transmission of data is reported by the controller via
 *      mbox_chan_txdone (if it has some TX ACK irq). It must not
 *      sleep.

As long as send_data() does not sleep, ideas are welcome.

Cheers.



More information about the Linux-mediatek mailing list