[REPORT_ISSUE]: RK3399 pd power down failed
Rafael J. Wysocki
rjw at rjwysocki.net
Tue Feb 23 12:09:08 EST 2021
On Tuesday, February 23, 2021 12:30:39 PM CET Ulf Hansson wrote:
> On Wed, 20 Jan 2021 at 10:30, zhangqing at rock-chips.com
> <zhangqing at rock-chips.com> wrote:
> >
> > Hi, Heiko :
> >
> > In rk3399 evb board, I found a probabilistic problem about PD. Turning off PD occasionally failed.
> >
> > log show:
> > Open the vop
> > #modetest -M rockchip -s 42 at 36:1536x2048 -P 31 at 36:1536x2048 at AR24 -a
> >
> > close the vop
> > #enter
> >
> > # cat sys/kernel/debug/pm_genpd/pm_genpd_summary
> > domain status slaves
> > /device runtime status
> > ----------------------------------------------------------------------
> > pd_vopl off
> > pd_vopb on
> > /devices/platform/ff903f00.iommu suspended
> > /devices/platform/ff900000.vop suspended
> >
> > I have checked the codes and concluded that there is a window of time for PD to be closed when using the device link. Once queue_work is executed immediately, PD power off may be failed.
> > The process is as follows:
> >
> > VOP requests to power off PD:
> > pm_runtime_put_sync(vop->dev)
> > -> rpm_idle(vop)
> > -> rpm_suspend(vop)
> > -> __update_runtime_status(dev, RPM_SUSPENDING)
> > -> rpm_callback(vop)
> > -> __rpm_callback(vop)
> > -> do power off pd callback(genpd_power_off)
> > -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : active,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > -> Handle link device callbacks according to device link(rpm_put_suppliers)
> > -> pm_runtime_put(link->supplier)
> > -> queue_work(pm_wq, &dev->power.work), execute immediately
> > ->rpm_idle(iommu)
> > -> rpm_suspend(iommu)
> > -> rpm_callback(iommu)
> > -> rk_iommu_suspend
> > -> do power off pd callback(genpd_power_off)
> > -> list_for_each_entry(pdd, &genpd->dev_list, list_node), ff900000.vop: suspending, ff903f00.iommu : suspending,so not_suspended = 2 return -EBUSY; Not really power off PD。
> > -> iommu do __update_runtime_status(dev, RPM_SUSPENDED)
> > -> vop do __update_runtime_status(dev, RPM_SUSPENDED)
>
> So, rpm_suspend() tries to suspend the supplier device link via
> rpm_put_suppliers(), before it has updated its consumer device's state
> to RPM_SUSPENDED.
>
> This looks worrying to me, both because it's seems wrong to allow a
> supplier to be suspended before a consumers device's state has reached
> RPM_SUSPENDED - but also because it's not consistent with the way we
> treat parent/child devices. The child's state will always be set to
> RPM_SUSPENDED, before we try to suspend its parent by calling
> rpm_idle() for it in rpm_suspend().
>
> Rafael, what's your take on this? Would it make sense to align the
> behavior for consumer/supplier-links in rpm_suspend() according to
> child/parents?
Suspending the suppliers before changing the consumer RPM status to
"suspended" is indeed incorrect, which is something I overlooked when
writing the code in question.
Fortunately, it seems to be relatively easy to address.
Please see the appended tentative patch (untested). It also avoids reading
runtime_status outside the lock which is arguably fishy.
---
drivers/base/power/runtime.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -330,7 +330,11 @@ static int __rpm_callback(int (*cb)(stru
if (dev->power.irq_safe) {
spin_unlock(&dev->power.lock);
+ } else if (!use_links) {
+ spin_unlock_irq(&dev->power.lock);
} else {
+ bool get = dev->power.runtime_status == RPM_RESUMING;
+
spin_unlock_irq(&dev->power.lock);
/*
@@ -340,7 +344,7 @@ static int __rpm_callback(int (*cb)(stru
* routine returns, so it is safe to read the status outside of
* the lock.
*/
- if (use_links && dev->power.runtime_status == RPM_RESUMING) {
+ if (get) {
idx = device_links_read_lock();
retval = rpm_get_suppliers(dev);
@@ -355,7 +359,21 @@ static int __rpm_callback(int (*cb)(stru
if (dev->power.irq_safe) {
spin_lock(&dev->power.lock);
+ } if (!use_links) {
+ spin_lock_irq(&dev->power.lock);
} else {
+ bool put;
+
+ spin_lock_irq(&dev->power.lock);
+
+ put = dev->power.runtime_status == RPM_SUSPENDING && !retval;
+ if (put)
+ __update_runtime_status(dev, RPM_SUSPENDED);
+ else
+ put = dev->power.runtime_status == RPM_RESUMING && retval;
+
+ spin_unlock_irq(&dev->power.lock);
+
/*
* If the device is suspending and the callback has returned
* success, drop the usage counters of the suppliers that have
@@ -363,9 +381,7 @@ static int __rpm_callback(int (*cb)(stru
*
* Do that if resume fails too.
*/
- if (use_links
- && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)
- || (dev->power.runtime_status == RPM_RESUMING && retval))) {
+ if (put) {
idx = device_links_read_lock();
fail:
More information about the Linux-rockchip
mailing list