[PATCH v3 21/24] pmdomain: core: Leave powered-on genpds on until late_initcall_sync
Ulf Hansson
ulf.hansson at linaro.org
Tue Jul 15 04:32:46 PDT 2025
On Tue, 15 Jul 2025 at 12:28, Jon Hunter <jonathanh at nvidia.com> wrote:
>
> Hi Ulf,
>
> On 10/07/2025 15:54, Ulf Hansson wrote:
> > On Thu, 10 Jul 2025 at 14:26, Marek Szyprowski <m.szyprowski at samsung.com> wrote:
> >>
> >> On 01.07.2025 13:47, Ulf Hansson wrote:
> >>> Powering-off a genpd that was on during boot, before all of its consumer
> >>> devices have been probed, is certainly prone to problems.
> >>>
> >>> As a step to improve this situation, let's prevent these genpds from being
> >>> powered-off until genpd_power_off_unused() gets called, which is a
> >>> late_initcall_sync().
> >>>
> >>> Note that, this still doesn't guarantee that all the consumer devices has
> >>> been probed before we allow to power-off the genpds. Yet, this should be a
> >>> step in the right direction.
> >>>
> >>> Suggested-by: Saravana Kannan <saravanak at google.com>
> >>> Tested-by: Hiago De Franco <hiago.franco at toradex.com> # Colibri iMX8X
> >>> Tested-by: Tomi Valkeinen <tomi.valkeinen at ideasonboard.com> # TI AM62A,Xilinx ZynqMP ZCU106
> >>> Signed-off-by: Ulf Hansson <ulf.hansson at linaro.org>
> >>
> >> This change has a side effect on some Exynos based boards, which have
> >> display and bootloader is configured to setup a splash screen on it.
> >> Since today's linux-next, those boards fails to boot, because of the
> >> IOMMU page fault.
> >
> > Thanks for reporting, let's try to fix this as soon as possible then.
> >
> >>
> >> This happens because the display controller is enabled and configured to
> >> perform the scanout from the spash-screen buffer until the respective
> >> driver will reset it in driver probe() function. This however doesn't
> >> work with IOMMU, which is being probed earlier than the display
> >> controller driver, what in turn causes IOMMU page fault once the IOMMU
> >> driver gets attached. This worked before applying this patch, because
> >> the power domain of display controller was simply turned off early
> >> effectively reseting the display controller.
> >
> > I can certainly try to help to find a solution, but I believe I need
> > some more details of what is happening.
> >
> > Perhaps you can point me to some relevant DTS file to start with?
> >
> >>
> >> This has been discussed a bit recently:
> >> https://lore.kernel.org/all/544ad69cba52a9b87447e3ac1c7fa8c3@disroot.org/
> >> and I can add a workaround for this issue in the bootloaders of those
> >> boards, but this is something that has to be somehow addressed in a
> >> generic way.
> >
> > It kind of sounds like there is a missing power-domain not being
> > described in DT for the IOMMU, but I might have understood the whole
> > thing wrong.
> >
> > Let's see if we can work something out in the next few days, otherwise
> > we need to find another way to let some genpds for these platforms to
> > opt out from this new behaviour.
>
> Have you found any resolution for this? I have also noticed a boot
> regression on one of our Tegra210 boards and bisect is pointing to this
> commit. I don't see any particular crash, but a hang on boot.
Thanks for reporting!
For Exynos we opt-out from the behaviour by enforcing a sync_state of
all PM domains upfront [1], which means before any devices get
attached.
Even if that defeats the purpose of the $subject series, this was one
way forward that solved the problem. When the boot-ordering problem
(that's how I understood the issue) for Exynos gets resolved, we
should be able to drop the hack, at least that's the idea.
>
> If there is any debug we can enable to see which pmdomain is the problem
> let me know.
There aren't many debug prints in genpd that I think makes much sense
to enable, but you can always give it a try. Since you are hanging,
obviously you can't look at the genpd debugfs data...
Note that, the interesting PM domains are those that are powered-on
when calling pm_genpd_init(). As a start, I would add some debug
prints in () to see which PM domains that are relevant to track.
Potentially you could then try to power them off and register them
accordingly with genpd. One by one, to see which of them is causing
the problem.
Another option could be to add a new genpd config flag
(GENPD_FLAG_DONT_STAY_ON or something along those lines), that informs
genpd to not set the genpd->stay_on in pm_genpd_init(). Then
tegra_powergate_add() would have to set GENPD_FLAG_DONT_STAY_ON for
those genpds that really need it.
Kind regards
Uffe
[1]
https://lore.kernel.org/all/20250711114719.189441-1-ulf.hansson@linaro.org/
More information about the linux-arm-kernel
mailing list