[PATCH 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh

Doug Anderson dianders at chromium.org
Fri Mar 7 08:53:37 PST 2025


Hi,

On Thu, Mar 6, 2025 at 6:18 PM Yicong Yang <yangyicong at huawei.com> wrote:
>
> From: Yicong Yang <yangyicong at hisilicon.com>
>
> arm64 depends on the cpufreq driver to gain the maximum cpu frequency
> to convert the watchdog_thresh to perf event period. cpufreq drivers
> like cppc_cpufreq will be initialized lately after the initializing of
> the hard lockup detector so just use a safe cpufreq which will be
> inaccurency. Use a cpufreq notifier to adjust the event's period to
> a more accurate one.
>
> Signed-off-by: Yicong Yang <yangyicong at hisilicon.com>
> ---
>  arch/arm64/kernel/watchdog_hld.c | 34 ++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
>
> diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
> index dcd25322127c..719ae39e8bc0 100644
> --- a/arch/arm64/kernel/watchdog_hld.c
> +++ b/arch/arm64/kernel/watchdog_hld.c
> @@ -34,3 +34,37 @@ bool __init arch_perf_nmi_is_available(void)
>          */
>         return arm_pmu_irq_is_nmi();
>  }
> +
> +static int watchdog_freq_notifier_callback(struct notifier_block *nb,
> +                                          unsigned long val, void *data)
> +{
> +       struct cpufreq_policy *policy = data;
> +       unsigned long max_cpu_freq;

Feels like max_cpu_freq should be a u64. It probably should have been
u64 in hw_nmi_get_sample_period() as well. Otherwise you could end up
with a problem if the max CPU frequency is above ~4.3GHz and "unsigned
long" is 32-bits (I think it technically could be?)


> +       u64 new_period;
> +       int cpu;
> +
> +       if (val != CPUFREQ_CREATE_POLICY)
> +               return 0;

nit: maybe better documenting to return "NOTIFY_DONE" instead of 0?


> +       for_each_cpu(cpu, policy->cpus) {
> +               max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
> +               if (!max_cpu_freq)
> +                       continue;
> +
> +               new_period = watchdog_thresh * max_cpu_freq;
> +               hardlockup_detector_perf_adjust_period(cpu, new_period);
> +       }
> +
> +       return 0;
> +}
> +
> +static struct notifier_block watchdog_freq_notifier = {
> +       .notifier_call = watchdog_freq_notifier_callback,
> +};
> +
> +static int __init init_watchdog_freq_notifier(void)
> +{
> +       return cpufreq_register_notifier(&watchdog_freq_notifier,
> +                                        CPUFREQ_POLICY_NOTIFIER);
> +}
> +core_initcall(init_watchdog_freq_notifier);

I haven't dug into everything, but don't you have a potential blind
spot, though? What if the order is this:

1. hw_nmi_get_sample_period() is called
2. cpufreq driver probes
3. init_watchdog_freq_notifier() initcall runs

Since the cpufreq driver proved before you registered for the
notifier, won't you miss the creation of the policy? Maybe instead of
registering for an initcall you should somehow trigger off the first
call to hw_nmi_get_sample_period() and register at that point in time?

-Doug



More information about the linux-arm-kernel mailing list