[PATCH 2/2] arm64/watchdog_hld: Add a cpufreq notifier for update watchdog thresh

Wed Mar 12 00:47:48 PDT 2025

On 2025/3/11 23:41, Doug Anderson wrote:
> Hi,
> 
> On Tue, Mar 11, 2025 at 1:33 AM Yicong Yang <yangyicong at huawei.com> wrote:
>>
>> On 2025/3/8 0:53, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Thu, Mar 6, 2025 at 6:18 PM Yicong Yang <yangyicong at huawei.com> wrote:
>>>>
>>>> From: Yicong Yang <yangyicong at hisilicon.com>
>>>>
>>>> arm64 depends on the cpufreq driver to gain the maximum cpu frequency
>>>> to convert the watchdog_thresh to perf event period. cpufreq drivers
>>>> like cppc_cpufreq will be initialized lately after the initializing of
>>>> the hard lockup detector so just use a safe cpufreq which will be
>>>> inaccurency. Use a cpufreq notifier to adjust the event's period to
>>>> a more accurate one.
>>>>
>>>> Signed-off-by: Yicong Yang <yangyicong at hisilicon.com>
>>>> ---
>>>>  arch/arm64/kernel/watchdog_hld.c | 34 ++++++++++++++++++++++++++++++++
>>>>  1 file changed, 34 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
>>>> index dcd25322127c..719ae39e8bc0 100644
>>>> --- a/arch/arm64/kernel/watchdog_hld.c
>>>> +++ b/arch/arm64/kernel/watchdog_hld.c
>>>> @@ -34,3 +34,37 @@ bool __init arch_perf_nmi_is_available(void)
>>>>          */
>>>>         return arm_pmu_irq_is_nmi();
>>>>  }
>>>> +
>>>> +static int watchdog_freq_notifier_callback(struct notifier_block *nb,
>>>> +                                          unsigned long val, void *data)
>>>> +{
>>>> +       struct cpufreq_policy *policy = data;
>>>> +       unsigned long max_cpu_freq;
>>>
>>> Feels like max_cpu_freq should be a u64. It probably should have been
>>> u64 in hw_nmi_get_sample_period() as well. Otherwise you could end up
>>> with a problem if the max CPU frequency is above ~4.3GHz and "unsigned
>>> long" is 32-bits (I think it technically could be?)
>>>
>>
>> I guess this won't happen for arm64 and we're in kernel space. here I
>> keep the same type with hw_nmi_get_sample_period() which is also using
>> "unsigned long".
> 
> I guess I won't insist, but given that you use u64 explicitly
> elsewhere it feels like you should be explicit that you need 64-bits
> here too.
> 
> 
>>>> +       for_each_cpu(cpu, policy->cpus) {
>>>> +               max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
>>>> +               if (!max_cpu_freq)
>>>> +                       continue;
>>>> +
>>>> +               new_period = watchdog_thresh * max_cpu_freq;
>>>> +               hardlockup_detector_perf_adjust_period(cpu, new_period);
>>>> +       }
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +static struct notifier_block watchdog_freq_notifier = {
>>>> +       .notifier_call = watchdog_freq_notifier_callback,
>>>> +};
>>>> +
>>>> +static int __init init_watchdog_freq_notifier(void)
>>>> +{
>>>> +       return cpufreq_register_notifier(&watchdog_freq_notifier,
>>>> +                                        CPUFREQ_POLICY_NOTIFIER);
>>>> +}
>>>> +core_initcall(init_watchdog_freq_notifier);
>>>
>>> I haven't dug into everything, but don't you have a potential blind
>>> spot, though? What if the order is this:
>>>
>>> 1. hw_nmi_get_sample_period() is called
>>> 2. cpufreq driver probes
>>> 3. init_watchdog_freq_notifier() initcall runs
>>>
>>> Since the cpufreq driver proved before you registered for the
>>> notifier, won't you miss the creation of the policy? Maybe instead of
>>> registering for an initcall you should somehow trigger off the first
>>> call to hw_nmi_get_sample_period() and register at that point in time?
>>>
>>
>> for arm64 the watchdog_perf is registered after armv8 PMU driver's init,
>> so the sequence is like:
>>
>> # core_initcall
>> init_watchdog_freq_notifier()
>> # device_initcall
>> armv8_pmu_driver_init()
>>   lockup_detector_retry_init()
>>     schedule_work(&detector_work);
>> # in detector work
>> hw_nmi_get_sample_period() // notifier's already been registered
>>
>> so we'll always get the correct frequency, either by the cpufreq
>> already registered or by the notifier which will update the
>> frequency. Please correct me if I missed anything :)
> 
> Hmm, OK. So this relies on the fact that because the
> init_watchdog_freq_notifier() is at "core" level that it'll pretty
> much init before any of the other things. This type of thing is always
> a little fragile but I guess it's OK.
> 
> What about if the order is different though? Again, I haven't traced
> through it all, but I'm thinking:
> 
> # core_initcall
> init_watchdog_freq_notifier()
> # device_initcall
> <cpufreq driver policy is created>
> 
> Now hardlockup_detector_perf_adjust_period() will be called before
> armv8_pmu_driver_init(). Is that a problem?

it doesn't matter, since in this situation:

# device_initcall
<cpufreq driver policy is created>
  hardlockup_detector_perf_adjust_period()
    if (!event) // armv8_pmu_driver_init() not called, not event created
      return;
# later after armv8_pmu_driver_init()
hw_nmi_get_sample_period() // since cpufreq already been registered, we can
                           // get the accurate cpufreq

> 
> I guess even worse would be if async probe was on and the cpufreq
> driver and PMU driver were running at the same time?
> 

I think there're 2 cases here:
1) cpufreq initialized before hw_nmi_get_sample_period() called, in which
   case the cpufreq is accurate when create the event
2) hw_nmi_get_sample_period() called before cpufreq driver's initialized,
   in which case we rely on the notifier call to adjust the event's period

May need further check to see if there's any race window here, but the worst
case will be using the inaccurate cpufreq (5GHz) for calculating the period.

Thanks.