[PATCH v3 0/9] PM / Domains: Fix race conditions during boot

Rafael J. Wysocki rjw at rjwysocki.net
Fri Oct 31 17:20:38 PDT 2014


On Friday, October 31, 2014 10:16:14 AM Ulf Hansson wrote:
> On 24 October 2014 18:12, Kevin Hilman <khilman at kernel.org> wrote:
> > Ulf Hansson <ulf.hansson at linaro.org> writes:
> >
> >> Changes in v3:
> >>       -Rework the entire intermediate step which was suggested in v2.
> >>        That means solving the race condition, but also cope with PM domains
> >>        that are initialized in powered off state.
> >>
> >> Changes in v2:
> >>       -Added some acks.
> >>       -Updated commit messages.
> >>       -Included a wider audience of the patchset. V1 was lacking SoC
> >>        maintainers.
> >>
> >> Here are link to the first patchset, were some discussion started.
> >> http://marc.info/?l=linux-pm&m=141208104729597&w=2
> >>
> >> There may be more than one device in a PM domain which then will be
> >> probed at different points in time.
> >>
> >> Depending on timing and runtime PM support, in for the device related
> >> driver/subsystem, a PM domain may be advised to power off after a
> >> successful probe sequence.
> >>
> >> A general requirement for a device within a PM domain, is that the
> >> PM domain must stay powered during the probe sequence. To cope with
> >> such requirement, let's add two new APIs, dev_pm_domain_get|put().
> >>
> >> These APIs are intended to be invoked from subsystem-level code and the
> >> calls between get/put needs to be balanced.
> >>
> >> dev_pm_domain_get(), tells the PM domain that it needs to increase a
> >> usage count and to keep supplying power. dev_pm_domain_put(), does the
> >> opposite.
> >
> > I'm confused. Why arent' pm_runtime_get*() and pm_runtime_put*() working?
> 
> See, below.
> 
> >
> > What's not explained here (or what I'm not understanding) is why a PM
> > domain is powering off if it has active devices.
> 
> It doesn't. The problem is that using pm_runtime_get_sync() in this
> path is not working.
> 
> Now, I failed to include some of the important information from
> previous discussions around this patchset. Let me iterate the patchset
> with better commit messages, but let's first continue this thread.
> 
> Here are some of the previous discussion:
> 
> http://marc.info/?l=linux-pm&m=141270897014653&w=2
> http://marc.info/?l=linux-pm&m=141208104729597&w=2
> 
> Below is a summary of why I think "pm_runtime_get_sync()" isn't working for us.
> 
> 1)
> It's bad practice to use pm_runtime_get_sync() in the ->probe() path,

Honestly, I'm no longer amused.

> to bring your resources to full power. The consequence would be a
> driver that requires CONFIG_PM_RUNTIME to be even functional, which
> just isn't acceptable.

Sorry, but this is utter nonsense.

CONFIG_PM_RUNTIME unset means "no runtime PM at all", so all drivers can expect
everything they need to be always on.  If that is not the case, then someone is
doing runtime PM behind the scenes and therefore cheating.  Or in different
words, for CONFIG_PM_RUNTIME unset bus types, platforms etc must ensure that
everything is on from the drivers' perspective.

If that is the case, then calling pm_runtime_get_sync() from ->probe
for CONFIG_PM_RUNTIME unset simply doesn't matter.

Now, for CONFIG_PM_RUNTIME enabled, if power domains are in use, doing
pm_runtime_get_sync() from ->probe is the only way the driver can ensure
in a non-racy way that the device will be accessible going forward.

Why?  Simply because the probing need not happen during system initialization.
It very well may take places when the other devices in the same domain have
beein in use for quite a while and have been using runtime PM (in which
case the domain may go off at any time unless it is explicityly prevented from
doing that).

One thing that you may be missing is that, for CONFIG_PM_RUNTIME set,
runtime PM has to be either enabled or disabled for all devices in one
domain (and if it is disabled, then the domain needs to be always on for
all practical purposes).  Otherwise you can't just make all of them happy
at the same time.  The documentation doesn't cover this, because it had been
written before we even started to consider power domains.

> Drivers that behaves well within this context, follows the runtime PM
> documentation/recommendation.

So please go to Documentation/power/runtime_pm.txt and read it again.  Quote:

"If the default initial runtime PM status of the device (i.e. 'suspended')
reflects the actual state of the device, its bus type's or its driver's
->probe() callback will likely need to wake it up using one of the PM core's
helper functions described in Section 4.  In that case, pm_runtime_resume()
should be used.  Of course, for this purpose the device's runtime PM has to be
enabled earlier by calling pm_runtime_enable()."

So how is this in agreement with what you're saying, I wonder?

> They use pm_runtime_set_active() during ->probe() to reflect that their
> devices are fully powered and capable of handling I/O.

And how the heck can a driver (whose device belongs to a power domain) be sure
that the device is "fully powered and capable of handling I/O" duing ->probe()?

> You may also have a look at these discussions which also touches this
> topic, but within a context of another patchset.
> https://lkml.org/lkml/2014/10/23/95

Which looks like reiterating the same incorrect arguments.

> 2)
> Another good example why pm_runtime_get_sync() is a bad solution to
> our problem, is the amba bus. Before it invokes the driver's ->probe()
> callback it does the following.
> - "enable bus clock"
> - pm_runtime_get_noresume()
> - pm_runtime_set_active()
> - pm_runtime_enable()

Which may not work if power domains are involved.

> For these scenarios a pm_runtime_get_sync() from any of amba driver's
> ->probe() callback wouldn't have any effect, since the device is
> already active. In other words, the resources needs to be "manually"
> enabled.

There seems to be a lot of confusion around this, so let me summarize:

- For CONFIG_PM_RUNTIME unsed drivers should be able to *safely* assume
  that they can access devices at any time.  All of the runtime PM helpers
  don't matter then (up to the error codes returned).

- For CONFIG_PM_RUNTIME set, if power domains are in use, then runtime PM
  has to be either enabled or disabled for all devices in one domain (and
  if disabled, the expected behavior is like the above).

- For CONFIG_PM_RUNTIME set, if power domains are in use, drivers (or bus
  types etc) should not make any assumptions about devices being fully
  powered without ensuring that this is the case, usually by holding an
  active PM runtime reference to the device before accessing it.

Now, there may be places in the core/bus type code that aren't aligned with
the above, so they need to be fixed this way or another.

HTH,
Rafael




More information about the linux-arm-kernel mailing list