[PATCH] pmdomain: mediatek: fix race condition in power on/power off sequences

AngeloGioacchino Del Regno angelogioacchino.delregno at collabora.com
Wed Nov 29 04:52:40 PST 2023


Il 29/11/23 12:31, Eugen Hristev ha scritto:
> It can happen that during the power off sequence for a power domain
> another power on sequence is started, and it can lead to powering on and
> off in the same time for the similar power domain.
> This can happen if parallel probing occurs: one device starts probing, and
> one power domain is probe deferred, this leads to all power domains being
> rolled back and powered off, while in the same time another device starts
> probing and requests powering on the same power domains or similar.
> 
> This was encountered on MT8186, when the sequence is :
> Power on SSUSB
> Power on SSUSB_P1
> Power on DIS
> 	-> probe deferred
> Power off DIS
> Power off SSUSB_P1
> Power off SSUSB
> 
> During the sequence of powering off SSUSB, some new similar sequence starts,
> and during the power on of SSUSB, clocks are enabled.
> In this case, powering off SSUSB fails from the first sequence, because
> power off ACK bit check times out (as clocks are powered back on by the second
> sequence). In consequence, powering it on also times out, and it leads to
> the whole power domain in a bad state.
> 
> To solve this issue, added a mutex that locks the whole power off/power on
> sequence such that it would never happen that multiple sequences try to
> enable or disable the same power domain in parallel.
> 
> Fixes: 59b644b01cf4 ("soc: mediatek: Add MediaTek SCPSYS power domains")
> Signed-off-by: Eugen Hristev <eugen.hristev at collabora.com>

I don't think that it's a race between genpd_power_on() and genpd_power_off() calls
at all, because genpd *does* have locking after all... at least for probe and for
parents of a power domain (and more anyway).

As far as I remember, what happens when you start .probe()'ing a device is:
platform_probe() -> dev_pm_domain_attach() -> genpd_dev_pm_attach()

There, you end up with

	if (power_on) {
		genpd_lock(pd);
		ret = genpd_power_on(pd, 0);
		genpd_unlock(pd);
	}

...but when you fail probing, you go with genpd_dev_pm_detach(), which then calls

	/* Check if PM domain can be powered off after removing this device. */
	genpd_queue_power_off_work(pd);

but even then, you end up being in a worker doing

	genpd_lock(genpd);
	genpd_power_off(genpd, false, 0);
	genpd_unlock(genpd);

...so I don't understand why this mutex can resolve the situation here (also: are
you really sure that the race is solved like that?)

I'd say that this probably needs more justification and a trace of the actual
situation here.

Besides, if this really resolves the issue, I would prefer seeing variants of
scpsys_power_{on,off}() functions, because we anyway don't need to lock mutexes
during this driver's probe (add_subdomain calls scpsys_power_on()).
In that case, `scpsys_power_on_unlocked()` would be an idea... but still, please
analyze why your solution works, if it does, because I'm not convinced.

Cheers,
Angelo

> ---
>   drivers/pmdomain/mediatek/mtk-pm-domains.c | 24 +++++++++++++++++-----
>   1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pmdomain/mediatek/mtk-pm-domains.c b/drivers/pmdomain/mediatek/mtk-pm-domains.c
> index d5f0ee05c794..4f136b47e539 100644
> --- a/drivers/pmdomain/mediatek/mtk-pm-domains.c
> +++ b/drivers/pmdomain/mediatek/mtk-pm-domains.c
> @@ -9,6 +9,7 @@
>   #include <linux/io.h>
>   #include <linux/iopoll.h>
>   #include <linux/mfd/syscon.h>
> +#include <linux/mutex.h>
>   #include <linux/of.h>
>   #include <linux/of_clk.h>
>   #include <linux/platform_device.h>
> @@ -56,6 +57,7 @@ struct scpsys {
>   	struct device *dev;
>   	struct regmap *base;
>   	const struct scpsys_soc_data *soc_data;
> +	struct mutex mutex;
>   	struct genpd_onecell_data pd_data;
>   	struct generic_pm_domain *domains[];
>   };
> @@ -238,9 +240,13 @@ static int scpsys_power_on(struct generic_pm_domain *genpd)
>   	bool tmp;
>   	int ret;
>   
> +	mutex_lock(&scpsys->mutex);
> +
>   	ret = scpsys_regulator_enable(pd->supply);
> -	if (ret)
> +	if (ret) {
> +		mutex_unlock(&scpsys->mutex);
>   		return ret;
> +	}
>   
>   	ret = clk_bulk_prepare_enable(pd->num_clks, pd->clks);
>   	if (ret)
> @@ -291,6 +297,7 @@ static int scpsys_power_on(struct generic_pm_domain *genpd)
>   			goto err_enable_bus_protect;
>   	}
>   
> +	mutex_unlock(&scpsys->mutex);
>   	return 0;
>   
>   err_enable_bus_protect:
> @@ -305,6 +312,7 @@ static int scpsys_power_on(struct generic_pm_domain *genpd)
>   	clk_bulk_disable_unprepare(pd->num_clks, pd->clks);
>   err_reg:
>   	scpsys_regulator_disable(pd->supply);
> +	mutex_unlock(&scpsys->mutex);
>   	return ret;
>   }
>   
> @@ -315,13 +323,15 @@ static int scpsys_power_off(struct generic_pm_domain *genpd)
>   	bool tmp;
>   	int ret;
>   
> +	mutex_lock(&scpsys->mutex);
> +
>   	ret = scpsys_bus_protect_enable(pd);
>   	if (ret < 0)
> -		return ret;
> +		goto err_mutex_unlock;
>   
>   	ret = scpsys_sram_disable(pd);
>   	if (ret < 0)
> -		return ret;
> +		goto err_mutex_unlock;
>   
>   	if (pd->data->ext_buck_iso_offs && MTK_SCPD_CAPS(pd, MTK_SCPD_EXT_BUCK_ISO))
>   		regmap_set_bits(scpsys->base, pd->data->ext_buck_iso_offs,
> @@ -340,13 +350,15 @@ static int scpsys_power_off(struct generic_pm_domain *genpd)
>   	ret = readx_poll_timeout(scpsys_domain_is_on, pd, tmp, !tmp, MTK_POLL_DELAY_US,
>   				 MTK_POLL_TIMEOUT);
>   	if (ret < 0)
> -		return ret;
> +		goto err_mutex_unlock;
>   
>   	clk_bulk_disable_unprepare(pd->num_clks, pd->clks);
>   
>   	scpsys_regulator_disable(pd->supply);
>   
> -	return 0;
> +err_mutex_unlock:
> +	mutex_unlock(&scpsys->mutex);
> +	return ret;
>   }
>   
>   static struct
> @@ -700,6 +712,8 @@ static int scpsys_probe(struct platform_device *pdev)
>   		return PTR_ERR(scpsys->base);
>   	}
>   
> +	mutex_init(&scpsys->mutex);
> +
>   	ret = -ENODEV;
>   	for_each_available_child_of_node(np, node) {
>   		struct generic_pm_domain *domain;





More information about the linux-arm-kernel mailing list