[PATCH v3 0/5] Refactor thermal pressure update to avoid code duplication

Lukasz Luba lukasz.luba at arm.com
Fri Nov 5 09:26:32 PDT 2021


Hi Steev,

On 11/5/21 3:39 PM, Steev Klimaszewski wrote:
> Hi Lukasz,
> 

[snip]

> I've been testing this patchset on the Lenovo Yoga C630, and today while 
> compiling alacritty and running an apt-get full-upgrade, I found the 
> following in dmesg output:

Thank you for testing and sending feedback!

Are you using a mainline kernel or you applied on some vendor production
kernel this patch set? I need to exclude a different code base
from the equation, especially to the arch_topology.c init code.

> 
> [  194.343903] ------------[ cut here ]------------
> [  194.343912] WARNING: CPU: 4 PID: 192 at 
> drivers/base/arch_topology.c:188 
> topology_update_thermal_pressure+0xe4/0x100
> [  194.343928] Modules linked in: aes_ce_ccm snd_seq_dummy snd_hrtimer 
> snd_seq snd_seq_device algif_hash algif_skcipher af_alg bnep 
> cpufreq_ondemand cpufreq_conservative cpufreq_powersave 
> cpufreq_userspace lz4 lz4_compress zram zsmalloc q6asm_dai q6routing 
> q6afe_dai q6adm q6asm q6afe q6dsp_common snd_soc_wsa881x q6core 
> regmap_sdw snd_soc_wcd934x gpio_wcd934x soundwire_qcom snd_soc_wcd_mbhc 
> wcd934x regmap_slimbus uvcvideo videobuf2_vmalloc venus_enc venus_dec 
> videobuf2_dma_contig videobuf2_memops qrtr_smd fastrpc apr binfmt_misc 
> nls_ascii nls_cp437 vfat fat snd_soc_sdm845 snd_soc_rt5663 
> snd_soc_qcom_common pm8941_pwrkey joydev snd_soc_rl6231 aes_ce_blk 
> qcom_spmi_adc5 soundwire_bus qcom_vadc_common crypto_simd 
> qcom_spmi_temp_alarm snd_soc_core cryptd hci_uart snd_compress btqca 
> industrialio aes_ce_cipher snd_pcm_dmaengine btrtl crct10dif_ce btbcm 
> ghash_ce snd_pcm btintel gf128mul sha2_ce snd_timer venus_core bluetooth 
> snd v4l2_mem2mem sha256_arm64 videobuf2_v4l2 videobuf2_common soundcore
> [  194.344007]  sha1_ce videodev ecdh_generic ecc mc ath10k_snoc 
> ath10k_core hid_multitouch ath mac80211 qcom_rng libarc4 qcom_q6v5_mss 
> cfg80211 sg rfkill qcom_q6v5_pas qcom_pil_info slim_qcom_ngd_ctrl 
> qcom_wdt pdr_interface qcom_q6v5 evdev rmtfs_mem slimbus qcom_sysmon 
> fuse configfs qrtr ip_tables x_tables autofs4 ext4 mbcache jbd2 
> panel_simple rtc_pm8xxx msm llcc_qcom ocmem gpu_sched ti_sn65dsi86 
> drm_dp_aux_bus drm_kms_helper drm camcc_sdm845 ipa qcom_common 
> qmi_helpers mdt_loader gpio_keys pwm_bl
> [  194.344056] CPU: 4 PID: 192 Comm: kworker/4:1H Not tainted 5.15.0 #9
> [  194.344060] Hardware name: LENOVO 81JL/LNVNB161216, BIOS 
> 9UCN33WW(V2.06) 06/ 4/2019
> [  194.344062] Workqueue: events_highpri qcom_lmh_dcvs_poll
> [  194.344068] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS 
> BTYPE=--)
> [  194.344070] pc : topology_update_thermal_pressure+0xe4/0x100
> [  194.344073] lr : topology_update_thermal_pressure+0x30/0x100
> [  194.344075] sp : ffff800014043d10
> [  194.344076] x29: ffff800014043d10 x28: 0000000000000000 x27: 
> 0000000000000000
> [  194.344080] x26: ffff759c40b66974 x25: ffff759db37a2405 x24: 
> ffff759c40e83358
> [  194.344084] x23: 0000000000000000 x22: ffffb2699eafe1d8 x21: 
> 00000000002d1e00
> [  194.344087] x20: ffff759c49f5bc20 x19: 000000000000b8cc x18: 
> 0000000000000000
> [  194.344090] x17: 2f756e672d78756e x16: ffffb2699d7d83d0 x15: 
> 0000000000000000
> [  194.344093] x14: 0000000000000000 x13: 0000000000000030 x12: 
> 0101010101010101
> [  194.344096] x11: 7f7f7f7f7f7f7f7f x10: feff68716f676668 x9 : 
> ffffb2699dd66a58
> [  194.344099] x8 : fefefefefefefeff x7 : 000000000000000f x6 : 
> 0000000000000002
> [  194.344102] x5 : ffffc33415124000 x4 : 0000000000000400 x3 : 
> 0000000000000b19
> [  194.344105] x2 : 0000000000000b8c x1 : ffffb2699e678f40 x0 : 
> ffffb2699e678f48
> [  194.344108] Call trace:
> [  194.344110]  topology_update_thermal_pressure+0xe4/0x100
> [  194.344113]  qcom_lmh_dcvs_notify+0xc8/0x160
> [  194.344115]  qcom_lmh_dcvs_poll+0x20/0x2c
> [  194.344116]  process_one_work+0x1f4/0x490
> [  194.344120]  worker_thread+0x188/0x504
> [  194.344121]  kthread+0x12c/0x140
> [  194.344125]  ret_from_fork+0x10/0x20
> [  194.344128] ---[ end trace bd0039c4fb892d5b ]---

[snip]

That's interesting why we hit this. I should have added info about
those two values, which are compared.

Could you make this change and try it again, please?
We would know the problematic values, which triggered this.
---------------------8<-----------------------------------
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index db18d79065fe..0d8db0927041 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -185,8 +185,11 @@ void topology_update_thermal_pressure(const struct 
cpumask *cpus,
         /* Convert to MHz scale which is used in 'freq_factor' */
         capped_freq /= 1000;

-       if (WARN_ON(max_freq < capped_freq))
+       if (max_freq < capped_freq) {
+               pr_warn("THERMAL_PRESSURE: max_freq (%lu) < capped_freq 
(%lu) for CPUs [%*pbl]\n",
+                       max_freq, capped_freq, cpumask_pr_args(cpus));
                 return;
+       }

         capacity = mult_frac(capped_freq, max_capacity, max_freq);

------------------------------>8---------------------------

Could you also dump for me the cpufreq and capacity sysfs content?
$ grep . /sys/devices/system/cpu/cpu*/cpufreq/*
$ grep . /sys/devices/system/cpu/cpu*/cpu_capacity

Regards,
Lukasz



More information about the linux-arm-kernel mailing list