[PATCH v3 21/24] pmdomain: core: Leave powered-on genpds on until late_initcall_sync
Ulf Hansson
ulf.hansson at linaro.org
Tue Jul 15 04:34:47 PDT 2025
On Tue, 15 Jul 2025 at 13:32, Ulf Hansson <ulf.hansson at linaro.org> wrote:
>
> On Tue, 15 Jul 2025 at 12:28, Jon Hunter <jonathanh at nvidia.com> wrote:
> >
> > Hi Ulf,
> >
> > On 10/07/2025 15:54, Ulf Hansson wrote:
> > > On Thu, 10 Jul 2025 at 14:26, Marek Szyprowski <m.szyprowski at samsung.com> wrote:
> > >>
> > >> On 01.07.2025 13:47, Ulf Hansson wrote:
> > >>> Powering-off a genpd that was on during boot, before all of its consumer
> > >>> devices have been probed, is certainly prone to problems.
> > >>>
> > >>> As a step to improve this situation, let's prevent these genpds from being
> > >>> powered-off until genpd_power_off_unused() gets called, which is a
> > >>> late_initcall_sync().
> > >>>
> > >>> Note that, this still doesn't guarantee that all the consumer devices has
> > >>> been probed before we allow to power-off the genpds. Yet, this should be a
> > >>> step in the right direction.
> > >>>
> > >>> Suggested-by: Saravana Kannan <saravanak at google.com>
> > >>> Tested-by: Hiago De Franco <hiago.franco at toradex.com> # Colibri iMX8X
> > >>> Tested-by: Tomi Valkeinen <tomi.valkeinen at ideasonboard.com> # TI AM62A,Xilinx ZynqMP ZCU106
> > >>> Signed-off-by: Ulf Hansson <ulf.hansson at linaro.org>
> > >>
> > >> This change has a side effect on some Exynos based boards, which have
> > >> display and bootloader is configured to setup a splash screen on it.
> > >> Since today's linux-next, those boards fails to boot, because of the
> > >> IOMMU page fault.
> > >
> > > Thanks for reporting, let's try to fix this as soon as possible then.
> > >
> > >>
> > >> This happens because the display controller is enabled and configured to
> > >> perform the scanout from the spash-screen buffer until the respective
> > >> driver will reset it in driver probe() function. This however doesn't
> > >> work with IOMMU, which is being probed earlier than the display
> > >> controller driver, what in turn causes IOMMU page fault once the IOMMU
> > >> driver gets attached. This worked before applying this patch, because
> > >> the power domain of display controller was simply turned off early
> > >> effectively reseting the display controller.
> > >
> > > I can certainly try to help to find a solution, but I believe I need
> > > some more details of what is happening.
> > >
> > > Perhaps you can point me to some relevant DTS file to start with?
> > >
> > >>
> > >> This has been discussed a bit recently:
> > >> https://lore.kernel.org/all/544ad69cba52a9b87447e3ac1c7fa8c3@disroot.org/
> > >> and I can add a workaround for this issue in the bootloaders of those
> > >> boards, but this is something that has to be somehow addressed in a
> > >> generic way.
> > >
> > > It kind of sounds like there is a missing power-domain not being
> > > described in DT for the IOMMU, but I might have understood the whole
> > > thing wrong.
> > >
> > > Let's see if we can work something out in the next few days, otherwise
> > > we need to find another way to let some genpds for these platforms to
> > > opt out from this new behaviour.
> >
> > Have you found any resolution for this? I have also noticed a boot
> > regression on one of our Tegra210 boards and bisect is pointing to this
> > commit. I don't see any particular crash, but a hang on boot.
>
> Thanks for reporting!
>
> For Exynos we opt-out from the behaviour by enforcing a sync_state of
> all PM domains upfront [1], which means before any devices get
> attached.
>
> Even if that defeats the purpose of the $subject series, this was one
> way forward that solved the problem. When the boot-ordering problem
> (that's how I understood the issue) for Exynos gets resolved, we
> should be able to drop the hack, at least that's the idea.
>
> >
> > If there is any debug we can enable to see which pmdomain is the problem
> > let me know.
>
> There aren't many debug prints in genpd that I think makes much sense
> to enable, but you can always give it a try. Since you are hanging,
> obviously you can't look at the genpd debugfs data...
>
> Note that, the interesting PM domains are those that are powered-on
> when calling pm_genpd_init(). As a start, I would add some debug
> prints in () to see which PM domains that are relevant to track.
/s/()/tegra_powergate_add()
> Potentially you could then try to power them off and register them
> accordingly with genpd. One by one, to see which of them is causing
> the problem.
>
> Another option could be to add a new genpd config flag
> (GENPD_FLAG_DONT_STAY_ON or something along those lines), that informs
> genpd to not set the genpd->stay_on in pm_genpd_init(). Then
> tegra_powergate_add() would have to set GENPD_FLAG_DONT_STAY_ON for
> those genpds that really need it.
>
> Kind regards
> Uffe
>
> [1]
> https://lore.kernel.org/all/20250711114719.189441-1-ulf.hansson@linaro.org/
More information about the linux-arm-kernel
mailing list