From hujy652 at gmail.com Sat Nov 1 01:33:37 2025 From: hujy652 at gmail.com (Zhi-Jun You) Date: Sat, 1 Nov 2025 16:33:37 +0800 Subject: [PATCH ath-next] wifi: ath10k: simplify ath10k_htt_tx_mgmt_inc_pending In-Reply-To: <6de0a467-14f3-43e1-952c-b8cc7eb4801c@oss.qualcomm.com> References: <20251031111639.406873-1-hujy652@gmail.com> <6de0a467-14f3-43e1-952c-b8cc7eb4801c@oss.qualcomm.com> Message-ID: On Fri, Oct 31, 2025 at 11:00?PM Jeff Johnson wrote: > > On 10/31/2025 4:16 AM, Zhi-Jun You wrote: > > Remove is_mgmt from ath10k_htt_tx_mgmt_inc_pending and make sure we only > > call it when it's a mgmt frame. > > This fails to describe WHY the patch is needed > > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes Hi Jeff, My apologies. I will try to describe it in this mail and update it in v2 if it looks good to you. ath10k_htt_tx_mgmt_inc_pending() is called in ath10k_mac_tx_push_txq() and ath10k_mac_op_tx(). In ath10k_mac_tx_push_txq(), it checks is_mgmt before calling ath10k_htt_tx_mgmt_inc_pending() but there's another is_mgmt check inside which looks redundant. The function name itself already indicates that it's for mgmt frame only. This patch removed the is_mgmt check in ath10k_htt_tx_mgmt_inc_pending() and add an is_mgmt check in ath10k_mac_op_tx() to make sure it's only called for mgmt frames. Thanks for taking your time. Best regards, Zhi-Jun From jeff.johnson at oss.qualcomm.com Mon Nov 3 07:23:59 2025 From: jeff.johnson at oss.qualcomm.com (Jeff Johnson) Date: Mon, 3 Nov 2025 07:23:59 -0800 Subject: pull-request: ath-next-20251103 Message-ID: <30db167b-0ebb-40f3-8beb-e3966a4922f0@oss.qualcomm.com> The following changes since commit 94aced6ed9e2630bae0b5631e384a5302c4b6783: Merge tag 'wireless-next-2025-09-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next (2025-09-26 14:27:28 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath.git tags/ath-next-20251103 for you to fetch changes up to 059ca8fd692b67a77fb89e9d4e8f57cf08e32b08: wifi: ath10k: use = {} to initialize bmi_target_info instead of memset (2025-10-30 14:55:08 -0700) ---------------------------------------------------------------- ath.git patches for v6.19 Highlights for some specific drivers include: ath10k: Add support for Factory Test TLV commands ath11k: Add support for Tx Power insertion ath12k: Add support for BSS color change And of course there is the usual set of cleanups and bug fixes across the entire family of "ath" drivers. We do expect to have one more pull request before the v6.19 merge window to pull in the refactored ath12k driver from the ath12k-ng branch. --- Note to maintainers: There is a trivial conflict between two patches: >From ath-current => wireless => net 9222582ec524 ("Revert "wifi: ath12k: Fix missing station power save configuration"") >From ath-next => wireless-next => net-next 6917e268c433 ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon") The resolution is to take both hunks, ordering them in reverse xmas tree style. ---------------------------------------------------------------- Abdun Nihaal (1): wifi: ath12k: fix potential memory leak in ath12k_wow_arp_ns_offload() Aditya Kumar Singh (5): wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon wifi: ath11k: relocate some Tx power related functions in mac.c wifi: ath11k: wrap ath11k_mac_op_get_txpower() with lock-aware internal helper wifi: ath11k: add support for Tx Power insertion in RRM action frame wifi: ath11k: advertise NL80211_FEATURE_TX_POWER_INSERTION Baochen Qiang (7): wifi: ath11k: restore register window after global reset wifi: ath12k: fix VHT MCS assignment wifi: ath11k: fix VHT MCS assignment wifi: ath11k: fix peer HE MCS assignment wifi: ath12k: restore register window after global reset wifi: ath12k: fix reusing m3 memory wifi: ath12k: fix error handling in creating hardware group Dr. David Alan Gilbert (1): wifi: wcn36xx: Remove unused wcn36xx_smd_update_scan_params Jeff Johnson (3): wifi: ath11k: Remove struct wmi_bcn_send_from_host_cmd wifi: ath12k: Remove struct wmi_bcn_send_from_host_cmd wifi: ath11k: Correctly use "ab" macro parameter Kang Yang (1): wifi: ath10k: move recovery check logic into a new work Loic Poulain (1): wifi: ath10k: Support for FTM TLV test commands Muna Sinada (6): wifi: ath12k: generalize GI and LTF fixed rate functions wifi: ath12k: add EHT rate handling to existing set rate functions wifi: ath12k: Add EHT MCS/NSS rates to Peer Assoc wifi: ath12k: Add EHT fixed GI/LTF wifi: ath12k: add EHT rates to ath12k_mac_op_set_bitrate_mask() wifi: ath12k: Set EHT fixed rates for associated STAs Pradeep Kumar Chitrapu (1): wifi: ath12k: fix TX and RX MCS rate configurations in HE mode Rameshkumar Sundaram (2): wifi: ath12k: enforce vdev limit in ath12k_mac_vdev_create() wifi: ath12k: unassign arvif on scan vdev create failure Sarika Sharma (3): wifi: ath12k: Fix MSDU buffer types handling in RX error path wifi: ath12k: track dropped MSDU buffer type packets in REO exception ring wifi: ath12k: Assert base_lock is held before allocating REO update element Takashi Iwai (1): wifi: ath12k: Add MODULE_FIRMWARE() entries Thiraviyam Mariyappan (1): wifi: ath12k: Fix NSS value update in ext_rx_stats Wei Zhang (1): wifi: ath12k: add support for BSS color change Zhongqiu Han (2): wifi: ath10k: use = {} to initialize pm_qos_request instead of memset wifi: ath10k: use = {} to initialize bmi_target_info instead of memset drivers/net/wireless/ath/ath10k/core.c | 28 +- drivers/net/wireless/ath/ath10k/core.h | 6 +- drivers/net/wireless/ath/ath10k/mac.c | 2 +- drivers/net/wireless/ath/ath10k/testmode.c | 253 +++++++-- drivers/net/wireless/ath/ath10k/testmode_i.h | 15 + drivers/net/wireless/ath/ath10k/wmi.h | 19 +- drivers/net/wireless/ath/ath11k/hal.h | 38 +- drivers/net/wireless/ath/ath11k/mac.c | 455 +++++++++++----- drivers/net/wireless/ath/ath11k/pci.c | 20 +- drivers/net/wireless/ath/ath11k/pci.h | 18 +- drivers/net/wireless/ath/ath11k/wmi.c | 20 +- drivers/net/wireless/ath/ath11k/wmi.h | 18 +- drivers/net/wireless/ath/ath12k/core.c | 22 +- drivers/net/wireless/ath/ath12k/core.h | 3 + drivers/net/wireless/ath/ath12k/debugfs.c | 5 +- drivers/net/wireless/ath/ath12k/dp_mon.c | 19 +- drivers/net/wireless/ath/ath12k/dp_rx.c | 74 ++- drivers/net/wireless/ath/ath12k/hal_rx.c | 10 +- drivers/net/wireless/ath/ath12k/mac.c | 755 ++++++++++++++++++++++----- drivers/net/wireless/ath/ath12k/mac.h | 14 +- drivers/net/wireless/ath/ath12k/pci.c | 24 +- drivers/net/wireless/ath/ath12k/qmi.c | 11 +- drivers/net/wireless/ath/ath12k/qmi.h | 5 +- drivers/net/wireless/ath/ath12k/wmi.c | 86 ++- drivers/net/wireless/ath/ath12k/wmi.h | 55 +- drivers/net/wireless/ath/ath12k/wow.c | 1 + drivers/net/wireless/ath/wcn36xx/hal.h | 74 --- drivers/net/wireless/ath/wcn36xx/smd.c | 60 --- drivers/net/wireless/ath/wcn36xx/smd.h | 1 - 29 files changed, 1554 insertions(+), 557 deletions(-) From sfr at canb.auug.org.au Wed Nov 5 16:56:36 2025 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 6 Nov 2025 11:56:36 +1100 Subject: linux-next: manual merge of the ath-next tree with the ath tree In-Reply-To: <20251030113037.1932c6d2@canb.auug.org.au> References: <20251030113037.1932c6d2@canb.auug.org.au> Message-ID: <20251106115636.7ab861b3@canb.auug.org.au> Hi all, On Thu, 30 Oct 2025 11:30:37 +1100 Stephen Rothwell wrote: > > Today's linux-next merge of the ath-next tree got a conflict in: > > drivers/net/wireless/ath/ath12k/mac.c > > between commit: > > 9222582ec524 ("Revert "wifi: ath12k: Fix missing station power save configuration"") > > from the ath tree and commit: > > 6917e268c433 ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon") > > from the ath-next tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > > -- > Cheers, > Stephen Rothwell > > diff --cc drivers/net/wireless/ath/ath12k/mac.c > index db351c922018,e79d457e3c03..000000000000 > --- a/drivers/net/wireless/ath/ath12k/mac.c > +++ b/drivers/net/wireless/ath/ath12k/mac.c > @@@ -4209,7 -4286,7 +4267,8 @@@ static void ath12k_mac_bss_info_changed > { > struct ath12k_vif *ahvif = arvif->ahvif; > struct ieee80211_vif *vif = ath12k_ahvif_to_vif(ahvif); > + struct ieee80211_vif_cfg *vif_cfg = &vif->cfg; > + struct ath12k_link_vif *tx_arvif; > struct cfg80211_chan_def def; > u32 param_id, param_value; > enum nl80211_band band; This is now a conflict between the wireless-next tree and the wireless tree. -- Cheers, Stephen Rothwell -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From devnull+david.ixit.cz at kernel.org Mon Nov 10 06:26:24 2025 From: devnull+david.ixit.cz at kernel.org (David Heidelberg via B4 Relay) Date: Mon, 10 Nov 2025 15:26:24 +0100 Subject: [PATCH v2 1/3] dt-bindings: wireless: ath10k: Introduce quirk to skip host cap QMI requests In-Reply-To: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> References: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> Message-ID: <20251110-skip-host-cam-qmi-req-v2-1-0daf485a987a@ixit.cz> From: Amit Pundir Introducing this quirk to skip host capability QMI request for the firmware versions which do not support this feature. Signed-off-by: Amit Pundir Signed-off-by: David Heidelberg --- Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml index f2440d39b7ebc..5120b3589ab57 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml @@ -171,6 +171,12 @@ properties: Quirk specifying that the firmware expects the 8bit version of the host capability QMI request + qcom,snoc-host-cap-skip-quirk: + type: boolean + description: + Quirk specifying that the firmware wants to skip the host + capability QMI request + qcom,xo-cal-data: $ref: /schemas/types.yaml#/definitions/uint32 description: -- 2.51.0 From devnull+david.ixit.cz at kernel.org Mon Nov 10 06:26:23 2025 From: devnull+david.ixit.cz at kernel.org (David Heidelberg via B4 Relay) Date: Mon, 10 Nov 2025 15:26:23 +0100 Subject: [PATCH v2 0/3] ath10k: Introduce a devicetree quirk to skip host cap QMI requests Message-ID: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> This quirk is used so far for Xiaomi Poco F1. I'm resending it after ~ 4 years since initial send due to Snapdragon 845 being one of best supported platform for mobile phones running Linux, so it would be shame to not have shiny support. I'm very much open to suggestions how to solve this in a different way, as the original discussion thread got quiet, see https://lore.kernel.org/all/b796bfee-b753-479a-a8d6-ba1fe3ee6222 at ixit.cz/ There could be other devices in need of this quirk, but if they're not, we could make it compatible specific quirk. Until merged, available also at: https://gitlab.com/dhxx/linux/-/commits/b4/skip-host-cam-qmi-req Signed-off-by: David Heidelberg --- Amit Pundir (3): dt-bindings: wireless: ath10k: Introduce quirk to skip host cap QMI requests ath10k: Introduce a devicetree quirk to skip host cap QMI requests arm64: dts: qcom: sdm845-xiaomi-beryllium: Enable ath10k host-cap skip quirk .../devicetree/bindings/net/wireless/qcom,ath10k.yaml | 6 ++++++ .../arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi | 2 ++ drivers/net/wireless/ath/ath10k/qmi.c | 13 ++++++++++--- drivers/net/wireless/ath/ath10k/snoc.c | 3 +++ drivers/net/wireless/ath/ath10k/snoc.h | 1 + 5 files changed, 22 insertions(+), 3 deletions(-) --- base-commit: ab40c92c74c6b0c611c89516794502b3a3173966 change-id: 20251110-skip-host-cam-qmi-req-e155628ebc39 Best regards, -- David Heidelberg From devnull+david.ixit.cz at kernel.org Mon Nov 10 06:26:26 2025 From: devnull+david.ixit.cz at kernel.org (David Heidelberg via B4 Relay) Date: Mon, 10 Nov 2025 15:26:26 +0100 Subject: [PATCH v2 3/3] arm64: dts: qcom: sdm845-xiaomi-beryllium: Enable ath10k host-cap skip quirk In-Reply-To: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> References: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> Message-ID: <20251110-skip-host-cam-qmi-req-v2-3-0daf485a987a@ixit.cz> From: Amit Pundir The WiFi firmware used on Xiaomi PocoPhone F1 (beryllium) phone doesn't support the host-capability QMI request, hence enable the skip quirk for this device. Signed-off-by: Amit Pundir Signed-off-by: David Heidelberg --- arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi b/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi index 785006a15e979..a3bfcf56ad3c8 100644 --- a/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi @@ -636,4 +636,6 @@ &wifi { vdd-1.3-rfa-supply = <&vreg_l17a_1p3>; vdd-3.3-ch0-supply = <&vreg_l25a_3p3>; vdd-3.3-ch1-supply = <&vreg_l23a_3p3>; + + qcom,snoc-host-cap-skip-quirk; }; -- 2.51.0 From devnull+david.ixit.cz at kernel.org Mon Nov 10 06:26:25 2025 From: devnull+david.ixit.cz at kernel.org (David Heidelberg via B4 Relay) Date: Mon, 10 Nov 2025 15:26:25 +0100 Subject: [PATCH v2 2/3] ath10k: Introduce a devicetree quirk to skip host cap QMI requests In-Reply-To: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> References: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> Message-ID: <20251110-skip-host-cam-qmi-req-v2-2-0daf485a987a@ixit.cz> From: Amit Pundir There are firmware versions which do not support host capability QMI request. We suspect either the host cap is not implemented or there may be firmware specific issues, but apparently there seem to be a generation of firmware that has this particular behavior. For example, firmware build on Xiaomi Poco F1 (sdm845) phone: "QC_IMAGE_VERSION_STRING=WLAN.HL.2.0.c3-00257-QCAHLSWMTPLZ-1" If we do not skip the host cap QMI request on Poco F1, then we get a QMI_ERR_MALFORMED_MSG_V01 error message in the ath10k_qmi_host_cap_send_sync(). But this error message is not fatal to the firmware nor to the ath10k driver and we can still bring up the WiFi services successfully if we just ignore it. Hence introducing this DeviceTree quirk to skip host capability QMI request for the firmware versions which do not support this feature. Suggested-by: Bjorn Andersson Signed-off-by: Amit Pundir --- drivers/net/wireless/ath/ath10k/qmi.c | 13 ++++++++++--- drivers/net/wireless/ath/ath10k/snoc.c | 3 +++ drivers/net/wireless/ath/ath10k/snoc.h | 1 + 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c index 8275345631a0b..ba5ab942e7ef8 100644 --- a/drivers/net/wireless/ath/ath10k/qmi.c +++ b/drivers/net/wireless/ath/ath10k/qmi.c @@ -808,6 +808,7 @@ ath10k_qmi_ind_register_send_sync_msg(struct ath10k_qmi *qmi) static void ath10k_qmi_event_server_arrive(struct ath10k_qmi *qmi) { struct ath10k *ar = qmi->ar; + struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar); int ret; ret = ath10k_qmi_ind_register_send_sync_msg(qmi); @@ -819,9 +820,15 @@ static void ath10k_qmi_event_server_arrive(struct ath10k_qmi *qmi) return; } - ret = ath10k_qmi_host_cap_send_sync(qmi); - if (ret) - return; + /* + * Skip the host capability request for the firmware versions which + * do not support this feature. + */ + if (!test_bit(ATH10K_SNOC_FLAG_SKIP_HOST_CAP_QUIRK, &ar_snoc->flags)) { + ret = ath10k_qmi_host_cap_send_sync(qmi); + if (ret) + return; + } ret = ath10k_qmi_msa_mem_info_send_sync_msg(qmi); if (ret) diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c index b3f6424c17d36..4def51cac2ed5 100644 --- a/drivers/net/wireless/ath/ath10k/snoc.c +++ b/drivers/net/wireless/ath/ath10k/snoc.c @@ -1346,6 +1346,9 @@ static void ath10k_snoc_quirks_init(struct ath10k *ar) if (of_property_read_bool(dev->of_node, "qcom,snoc-host-cap-8bit-quirk")) set_bit(ATH10K_SNOC_FLAG_8BIT_HOST_CAP_QUIRK, &ar_snoc->flags); + + if (of_property_read_bool(dev->of_node, "qcom,snoc-host-cap-skip-quirk")) + set_bit(ATH10K_SNOC_FLAG_SKIP_HOST_CAP_QUIRK, &ar_snoc->flags); } int ath10k_snoc_fw_indication(struct ath10k *ar, u64 type) diff --git a/drivers/net/wireless/ath/ath10k/snoc.h b/drivers/net/wireless/ath/ath10k/snoc.h index d4bce17076960..403f35af34c5d 100644 --- a/drivers/net/wireless/ath/ath10k/snoc.h +++ b/drivers/net/wireless/ath/ath10k/snoc.h @@ -50,6 +50,7 @@ enum ath10k_snoc_flags { ATH10K_SNOC_FLAG_MODEM_STOPPED, ATH10K_SNOC_FLAG_RECOVERY, ATH10K_SNOC_FLAG_8BIT_HOST_CAP_QUIRK, + ATH10K_SNOC_FLAG_SKIP_HOST_CAP_QUIRK, }; struct clk_bulk_data; -- 2.51.0 From jeff.johnson at oss.qualcomm.com Mon Nov 10 12:04:30 2025 From: jeff.johnson at oss.qualcomm.com (Jeff Johnson) Date: Mon, 10 Nov 2025 12:04:30 -0800 Subject: [PATCH v2 0/3] ath10k: Introduce a devicetree quirk to skip host cap QMI requests In-Reply-To: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> References: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> Message-ID: <2b34ceae-5e31-4dba-93e5-3fa35754fab6@oss.qualcomm.com> On 11/10/2025 6:26 AM, David Heidelberg via B4 Relay wrote: > This quirk is used so far for Xiaomi Poco F1. > > I'm resending it after ~ 4 years since initial send due to Snapdragon > 845 being one of best supported platform for mobile phones running > Linux, so it would be shame to not have shiny support. > > I'm very much open to suggestions how to solve this in a different way, > as the original discussion thread got quiet, see > https://lore.kernel.org/all/b796bfee-b753-479a-a8d6-ba1fe3ee6222 at ixit.cz/ > > There could be other devices in need of this quirk, but if they're not, > we could make it compatible specific quirk. > > Until merged, available also at: > https://gitlab.com/dhxx/linux/-/commits/b4/skip-host-cam-qmi-req > > Signed-off-by: David Heidelberg > --- > Amit Pundir (3): > dt-bindings: wireless: ath10k: Introduce quirk to skip host cap QMI requests > ath10k: Introduce a devicetree quirk to skip host cap QMI requests > arm64: dts: qcom: sdm845-xiaomi-beryllium: Enable ath10k host-cap skip quirk > > .../devicetree/bindings/net/wireless/qcom,ath10k.yaml | 6 ++++++ > .../arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi | 2 ++ > drivers/net/wireless/ath/ath10k/qmi.c | 13 ++++++++++--- > drivers/net/wireless/ath/ath10k/snoc.c | 3 +++ > drivers/net/wireless/ath/ath10k/snoc.h | 1 + > 5 files changed, 22 insertions(+), 3 deletions(-) > --- > base-commit: ab40c92c74c6b0c611c89516794502b3a3173966 > change-id: 20251110-skip-host-cam-qmi-req-e155628ebc39 > > Best regards, The original thread predates me becoming an ath.git maintainer. Just for my information, is the firmware and board files for this platform available in linux-firmware? Or does it leverage the files already present from the original (Android?) installation? I ask because the alternative solution suggested by Kalle would require modification of the board file on the device, and that seems more of a hassle than just modifying the DT. So I'm personally OK with this suggested approach. /jeff From dmitry.baryshkov at oss.qualcomm.com Mon Nov 10 12:35:52 2025 From: dmitry.baryshkov at oss.qualcomm.com (Dmitry Baryshkov) Date: Mon, 10 Nov 2025 22:35:52 +0200 Subject: [PATCH v2 1/3] dt-bindings: wireless: ath10k: Introduce quirk to skip host cap QMI requests In-Reply-To: <20251110-skip-host-cam-qmi-req-v2-1-0daf485a987a@ixit.cz> References: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> <20251110-skip-host-cam-qmi-req-v2-1-0daf485a987a@ixit.cz> Message-ID: On Mon, Nov 10, 2025 at 03:26:24PM +0100, David Heidelberg via B4 Relay wrote: > From: Amit Pundir > > Introducing this quirk to skip host capability QMI request for the firmware > versions which do not support this feature. If it is a firmware mis-feature, why don't we describe it in the firmware-N.bin file? > > Signed-off-by: Amit Pundir > Signed-off-by: David Heidelberg > --- > Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml | 6 ++++++ > 1 file changed, 6 insertions(+) -- With best wishes Dmitry From dmitry.baryshkov at oss.qualcomm.com Mon Nov 10 12:41:14 2025 From: dmitry.baryshkov at oss.qualcomm.com (Dmitry Baryshkov) Date: Mon, 10 Nov 2025 22:41:14 +0200 Subject: [PATCH v2 0/3] ath10k: Introduce a devicetree quirk to skip host cap QMI requests In-Reply-To: <2b34ceae-5e31-4dba-93e5-3fa35754fab6@oss.qualcomm.com> References: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> <2b34ceae-5e31-4dba-93e5-3fa35754fab6@oss.qualcomm.com> Message-ID: On Mon, Nov 10, 2025 at 12:04:30PM -0800, Jeff Johnson wrote: > On 11/10/2025 6:26 AM, David Heidelberg via B4 Relay wrote: > > This quirk is used so far for Xiaomi Poco F1. > > > > I'm resending it after ~ 4 years since initial send due to Snapdragon > > 845 being one of best supported platform for mobile phones running > > Linux, so it would be shame to not have shiny support. > > > > I'm very much open to suggestions how to solve this in a different way, > > as the original discussion thread got quiet, see > > https://lore.kernel.org/all/b796bfee-b753-479a-a8d6-ba1fe3ee6222 at ixit.cz/ > > > > There could be other devices in need of this quirk, but if they're not, > > we could make it compatible specific quirk. > > > > Until merged, available also at: > > https://gitlab.com/dhxx/linux/-/commits/b4/skip-host-cam-qmi-req > > > > Signed-off-by: David Heidelberg > > --- > > Amit Pundir (3): > > dt-bindings: wireless: ath10k: Introduce quirk to skip host cap QMI requests > > ath10k: Introduce a devicetree quirk to skip host cap QMI requests > > arm64: dts: qcom: sdm845-xiaomi-beryllium: Enable ath10k host-cap skip quirk > > > > .../devicetree/bindings/net/wireless/qcom,ath10k.yaml | 6 ++++++ > > .../arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi | 2 ++ > > drivers/net/wireless/ath/ath10k/qmi.c | 13 ++++++++++--- > > drivers/net/wireless/ath/ath10k/snoc.c | 3 +++ > > drivers/net/wireless/ath/ath10k/snoc.h | 1 + > > 5 files changed, 22 insertions(+), 3 deletions(-) > > --- > > base-commit: ab40c92c74c6b0c611c89516794502b3a3173966 > > change-id: 20251110-skip-host-cam-qmi-req-e155628ebc39 > > > > Best regards, > > The original thread predates me becoming an ath.git maintainer. > Just for my information, is the firmware and board files for this platform > available in linux-firmware? Or does it leverage the files already present > from the original (Android?) installation? > > I ask because the alternative solution suggested by Kalle would require > modification of the board file on the device, and that seems more of a hassle > than just modifying the DT. I think this should go to the firmware-N file. SNOC platforms now allow per-platform firmware description files, so it's possible to describe quirks for the particular firmware file. > > So I'm personally OK with this suggested approach. > > /jeff -- With best wishes Dmitry From devnull+david.ixit.cz at kernel.org Tue Nov 11 04:34:21 2025 From: devnull+david.ixit.cz at kernel.org (David Heidelberg via B4 Relay) Date: Tue, 11 Nov 2025 13:34:21 +0100 Subject: [PATCH 0/2] ath10k: Introduce a firmware quirk to skip host cap QMI requests Message-ID: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> It's follow up of recent discussion from https://lore.kernel.org/all/20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a at ixit.cz/ doing the workaround directly in firmware, so we don't pollute device-tree. I added the change needed to be done in Xiaomi Poco F1, so it's grouped, but I'm open to getting in first commit and sending the second later, when all firmwares and tools changes land. References: - https://gitlab.com/kernel-firmware/linux-firmware/-/merge_requests/780 - https://github.com/qca/qca-swiss-army-knife/pull/13 Signed-off-by: David Heidelberg --- David Heidelberg (2): ath10k: Introduce a firmware quirk to skip host cap QMI requests arm64: dts: qcom: xiaomi-beryllium: Add firmware-name qualifier to WiFi node .../arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi | 2 ++ drivers/net/wireless/ath/ath10k/core.c | 1 + drivers/net/wireless/ath/ath10k/core.h | 3 +++ drivers/net/wireless/ath/ath10k/qmi.c | 13 ++++++++++--- 4 files changed, 16 insertions(+), 3 deletions(-) --- base-commit: 2666975a8905776d306bee01c5d98a0395bda1c9 change-id: 20251111-xiaomi-beryllium-firmware-d8134ce67fec Best regards, -- David Heidelberg From devnull+david.ixit.cz at kernel.org Tue Nov 11 04:34:22 2025 From: devnull+david.ixit.cz at kernel.org (David Heidelberg via B4 Relay) Date: Tue, 11 Nov 2025 13:34:22 +0100 Subject: [PATCH 1/2] ath10k: Introduce a firmware quirk to skip host cap QMI requests In-Reply-To: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> References: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> Message-ID: <20251111-xiaomi-beryllium-firmware-v1-1-836b9c51ad86@ixit.cz> From: David Heidelberg There are firmware versions which do not support host capability QMI request. We suspect either the host cap is not implemented or there may be firmware specific issues, but apparently there seem to be a generation of firmware that has this particular behavior. For example, firmware build on Xiaomi Poco F1 (sdm845) phone: "QC_IMAGE_VERSION_STRING=WLAN.HL.2.0.c3-00257-QCAHLSWMTPLZ-1" If we do not skip the host cap QMI request on Xiaomi Poco F1, then we get a QMI_ERR_MALFORMED_MSG_V01 error message in the ath10k_qmi_host_cap_send_sync(). But this error message is not fatal to the firmware nor to the ath10k driver and we can still bring up the WiFi services successfully if we just ignore it. Hence introducing this firmware quirk to skip host capability QMI request for the firmware versions which do not support this feature. Suggested-by: Dmitry Baryshkov Signed-off-by: David Heidelberg --- drivers/net/wireless/ath/ath10k/core.c | 1 + drivers/net/wireless/ath/ath10k/core.h | 3 +++ drivers/net/wireless/ath/ath10k/qmi.c | 13 ++++++++++--- 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index 7c2939cbde5f0..7602631696798 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -773,6 +773,7 @@ static const char *const ath10k_core_fw_feature_str[] = { [ATH10K_FW_FEATURE_SINGLE_CHAN_INFO_PER_CHANNEL] = "single-chan-info-per-channel", [ATH10K_FW_FEATURE_PEER_FIXED_RATE] = "peer-fixed-rate", [ATH10K_FW_FEATURE_IRAM_RECOVERY] = "iram-recovery", + [ATH10K_FW_FEATURE_NO_HOST_CAP_QMI_REQ] = "no-host-cap-qmi-req", }; static unsigned int ath10k_core_get_fw_feature_str(char *buf, diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h index 73a9db302245d..b20541e4046f8 100644 --- a/drivers/net/wireless/ath/ath10k/core.h +++ b/drivers/net/wireless/ath/ath10k/core.h @@ -838,6 +838,9 @@ enum ath10k_fw_features { /* Firmware support IRAM recovery */ ATH10K_FW_FEATURE_IRAM_RECOVERY = 22, + /* Firmware does not support host capability QMI request */ + ATH10K_FW_FEATURE_NO_HOST_CAP_QMI_REQ = 23, + /* keep last */ ATH10K_FW_FEATURE_COUNT, }; diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c index 8275345631a0b..5dc8ea39372c1 100644 --- a/drivers/net/wireless/ath/ath10k/qmi.c +++ b/drivers/net/wireless/ath/ath10k/qmi.c @@ -819,9 +819,16 @@ static void ath10k_qmi_event_server_arrive(struct ath10k_qmi *qmi) return; } - ret = ath10k_qmi_host_cap_send_sync(qmi); - if (ret) - return; + /* + * Skip the host capability request for the firmware versions which + * do not support this feature. + */ + if (!test_bit(ATH10K_FW_FEATURE_NO_HOST_CAP_QMI_REQ, + ar->running_fw->fw_file.fw_features)) { + ret = ath10k_qmi_host_cap_send_sync(qmi); + if (ret) + return; + } ret = ath10k_qmi_msa_mem_info_send_sync_msg(qmi); if (ret) -- 2.51.0 From devnull+david.ixit.cz at kernel.org Tue Nov 11 04:34:23 2025 From: devnull+david.ixit.cz at kernel.org (David Heidelberg via B4 Relay) Date: Tue, 11 Nov 2025 13:34:23 +0100 Subject: [PATCH 2/2] arm64: dts: qcom: xiaomi-beryllium: Add firmware-name qualifier to WiFi node In-Reply-To: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> References: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> Message-ID: <20251111-xiaomi-beryllium-firmware-v1-2-836b9c51ad86@ixit.cz> From: David Heidelberg Add firmware-name property to the WiFi device tree node to specify board-specific lookup directory. Signed-off-by: David Heidelberg --- arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi b/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi index 785006a15e979..9b0b0446f4ad3 100644 --- a/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi @@ -631,6 +631,8 @@ &wcd9340 { &wifi { status = "okay"; + firmware-name "sdm845/Xiaomi/beryllium"; + vdd-0.8-cx-mx-supply = <&vreg_l5a_0p8>; vdd-1.8-xo-supply = <&vreg_l7a_1p8>; vdd-1.3-rfa-supply = <&vreg_l17a_1p3>; -- 2.51.0 From dmitry.baryshkov at oss.qualcomm.com Tue Nov 11 04:46:56 2025 From: dmitry.baryshkov at oss.qualcomm.com (Dmitry Baryshkov) Date: Tue, 11 Nov 2025 14:46:56 +0200 Subject: [PATCH 2/2] arm64: dts: qcom: xiaomi-beryllium: Add firmware-name qualifier to WiFi node In-Reply-To: <20251111-xiaomi-beryllium-firmware-v1-2-836b9c51ad86@ixit.cz> References: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> <20251111-xiaomi-beryllium-firmware-v1-2-836b9c51ad86@ixit.cz> Message-ID: On Tue, Nov 11, 2025 at 01:34:23PM +0100, David Heidelberg via B4 Relay wrote: > From: David Heidelberg > > Add firmware-name property to the WiFi device tree node to specify > board-specific lookup directory. > > Signed-off-by: David Heidelberg > --- > arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi b/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi > index 785006a15e979..9b0b0446f4ad3 100644 > --- a/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi > +++ b/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi > @@ -631,6 +631,8 @@ &wcd9340 { > &wifi { > status = "okay"; > > + firmware-name "sdm845/Xiaomi/beryllium"; This wasn't build-tested > + > vdd-0.8-cx-mx-supply = <&vreg_l5a_0p8>; > vdd-1.8-xo-supply = <&vreg_l7a_1p8>; > vdd-1.3-rfa-supply = <&vreg_l17a_1p3>; > > -- > 2.51.0 > > -- With best wishes Dmitry From david at ixit.cz Tue Nov 11 06:23:05 2025 From: david at ixit.cz (David Heidelberg) Date: Tue, 11 Nov 2025 15:23:05 +0100 Subject: [PATCH 2/2] arm64: dts: qcom: xiaomi-beryllium: Add firmware-name qualifier to WiFi node In-Reply-To: References: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> <20251111-xiaomi-beryllium-firmware-v1-2-836b9c51ad86@ixit.cz> Message-ID: <5c6a1434-1f43-4434-b6ed-0c5b98ee8d2b@ixit.cz> On 11/11/2025 13:46, Dmitry Baryshkov wrote: > On Tue, Nov 11, 2025 at 01:34:23PM +0100, David Heidelberg via B4 Relay wrote: >> From: David Heidelberg >> >> Add firmware-name property to the WiFi device tree node to specify >> board-specific lookup directory. >> >> Signed-off-by: David Heidelberg >> --- >> arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi b/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi >> index 785006a15e979..9b0b0446f4ad3 100644 >> --- a/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi >> +++ b/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi >> @@ -631,6 +631,8 @@ &wcd9340 { >> &wifi { >> status = "okay"; >> >> + firmware-name "sdm845/Xiaomi/beryllium"; > > This wasn't build-tested Sorry, I wanted to send it more like RFC to get initial feedback, I got user with Foco F1 who is willing to test the changes, so I should have new version with T-b until EOD. David > >> + >> vdd-0.8-cx-mx-supply = <&vreg_l5a_0p8>; >> vdd-1.8-xo-supply = <&vreg_l7a_1p8>; >> vdd-1.3-rfa-supply = <&vreg_l17a_1p3>; >> >> -- >> 2.51.0 >> >> > -- David Heidelberg From jeff.johnson at oss.qualcomm.com Tue Nov 11 09:28:13 2025 From: jeff.johnson at oss.qualcomm.com (Jeff Johnson) Date: Tue, 11 Nov 2025 09:28:13 -0800 Subject: pull-request: ath-next-20251111 Message-ID: <15a98cae-0274-45f4-9b8e-be6fa9720884@oss.qualcomm.com> The following changes since commit 2f6adeaf92c4ea4adf5a91b87497ba13bb057996: Merge tag 'ath-next-20251103' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath into wireless-next (2025-11-05 16:29:11 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath.git tags/ath-next-20251111 for you to fetch changes up to 2977567b244f056d86658160659f06cd6c78ba3d: wifi: ath12k: Fix timeout error during beacon stats retrieval (2025-11-06 07:33:31 -0800) ---------------------------------------------------------------- ath.git patches for v6.19 (#2) Just one 2-patch series for this PR. Once pulled into wireless-next, ath-next will fast-forward, and that will provide the baseline for merging ath12k-ng into ath-next. ---------------------------------------------------------------- Manish Dharanenthiran (2): wifi: ath12k: Make firmware stats reset caller-driven wifi: ath12k: Fix timeout error during beacon stats retrieval drivers/net/wireless/ath/ath12k/core.c | 2 -- drivers/net/wireless/ath/ath12k/core.h | 1 - drivers/net/wireless/ath/ath12k/debugfs.c | 9 +++------ drivers/net/wireless/ath/ath12k/mac.c | 15 ++++++++++----- drivers/net/wireless/ath/ath12k/wmi.c | 12 +----------- 5 files changed, 14 insertions(+), 25 deletions(-) From konrad.dybcio at oss.qualcomm.com Wed Nov 12 01:18:25 2025 From: konrad.dybcio at oss.qualcomm.com (Konrad Dybcio) Date: Wed, 12 Nov 2025 10:18:25 +0100 Subject: [PATCH 2/2] arm64: dts: qcom: xiaomi-beryllium: Add firmware-name qualifier to WiFi node In-Reply-To: <5c6a1434-1f43-4434-b6ed-0c5b98ee8d2b@ixit.cz> References: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> <20251111-xiaomi-beryllium-firmware-v1-2-836b9c51ad86@ixit.cz> <5c6a1434-1f43-4434-b6ed-0c5b98ee8d2b@ixit.cz> Message-ID: On 11/11/25 3:23 PM, David Heidelberg wrote: > On 11/11/2025 13:46, Dmitry Baryshkov wrote: >> On Tue, Nov 11, 2025 at 01:34:23PM +0100, David Heidelberg via B4 Relay wrote: >>> From: David Heidelberg >>> >>> Add firmware-name property to the WiFi device tree node to specify >>> board-specific lookup directory. >>> >>> Signed-off-by: David Heidelberg >>> --- >>> ? arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi | 2 ++ >>> ? 1 file changed, 2 insertions(+) >>> >>> diff --git a/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi b/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi >>> index 785006a15e979..9b0b0446f4ad3 100644 >>> --- a/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi >>> +++ b/arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi >>> @@ -631,6 +631,8 @@ &wcd9340 { >>> ? &wifi { >>> ????? status = "okay"; >>> ? +??? firmware-name "sdm845/Xiaomi/beryllium"; >> >> This wasn't build-tested > > Sorry, I wanted to send it more like RFC to get initial feedback, I got user with Foco F1 who is willing to test the changes, so I should have new version with T-b until EOD. Nothing in this thread seems to suggest this still awaits testing :/ Konrad From johannes at sipsolutions.net Wed Nov 12 03:51:43 2025 From: johannes at sipsolutions.net (Johannes Berg) Date: Wed, 12 Nov 2025 12:51:43 +0100 Subject: pull-request: ath-next-20251111 In-Reply-To: <15a98cae-0274-45f4-9b8e-be6fa9720884@oss.qualcomm.com> References: <15a98cae-0274-45f4-9b8e-be6fa9720884@oss.qualcomm.com> Message-ID: <7d445736914f971bfbf89b3480cd6552434eaf7f.camel@sipsolutions.net> On Tue, 2025-11-11 at 09:28 -0800, Jeff Johnson wrote: > > Once pulled into wireless-next, ath-next will fast-forward, and that > will provide the baseline for merging ath12k-ng into ath-next. I just sent a PR to net-next with this, so you might want to wait until that's merged and I pull net-next back, to have all the current net content as well. johannes From robh at kernel.org Wed Nov 12 06:26:49 2025 From: robh at kernel.org (Rob Herring (Arm)) Date: Wed, 12 Nov 2025 08:26:49 -0600 Subject: [PATCH 0/2] ath10k: Introduce a firmware quirk to skip host cap QMI requests In-Reply-To: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> References: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> Message-ID: <176295563430.1637854.8194190408456563396.robh@kernel.org> On Tue, 11 Nov 2025 13:34:21 +0100, David Heidelberg wrote: > It's follow up of recent discussion from > > https://lore.kernel.org/all/20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a at ixit.cz/ > > doing the workaround directly in firmware, so we don't pollute > device-tree. > > I added the change needed to be done in Xiaomi Poco F1, so it's grouped, > but I'm open to getting in first commit and sending the second later, > when all firmwares and tools changes land. > > References: > - https://gitlab.com/kernel-firmware/linux-firmware/-/merge_requests/780 > - https://github.com/qca/qca-swiss-army-knife/pull/13 > > Signed-off-by: David Heidelberg > --- > David Heidelberg (2): > ath10k: Introduce a firmware quirk to skip host cap QMI requests > arm64: dts: qcom: xiaomi-beryllium: Add firmware-name qualifier to WiFi node > > .../arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi | 2 ++ > drivers/net/wireless/ath/ath10k/core.c | 1 + > drivers/net/wireless/ath/ath10k/core.h | 3 +++ > drivers/net/wireless/ath/ath10k/qmi.c | 13 ++++++++++--- > 4 files changed, 16 insertions(+), 3 deletions(-) > --- > base-commit: 2666975a8905776d306bee01c5d98a0395bda1c9 > change-id: 20251111-xiaomi-beryllium-firmware-d8134ce67fec > > Best regards, > -- > David Heidelberg > > > My bot found new DTB warnings on the .dts files added or changed in this series. Some warnings may be from an existing SoC .dtsi. Or perhaps the warnings are fixed by another series. Ultimately, it is up to the platform maintainer whether these warnings are acceptable or not. No need to reply unless the platform maintainer has comments. If you already ran DT checks and didn't see these error(s), then make sure dt-schema is up to date: pip3 install dtschema --upgrade This patch series was applied (using b4) to base: Base: 2666975a8905776d306bee01c5d98a0395bda1c9 (use --merge-base to override) If this is not the correct base, please add 'base-commit' tag (or use b4 which does this automatically) New warnings running 'make CHECK_DTBS=y for arch/arm64/boot/dts/qcom/' for 20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86 at ixit.cz: Error: arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi:634.16-41 syntax error FATAL ERROR: Unable to parse input tree make[3]: *** [scripts/Makefile.dtbs:132: arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-ebbg.dtb] Error 1 make[2]: *** [scripts/Makefile.build:556: arch/arm64/boot/dts/qcom] Error 2 make[2]: Target 'arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-ebbg.dtb' not remade because of errors. make[1]: *** [/home/rob/proj/linux-dt-testing/Makefile:1500: qcom/sdm845-xiaomi-beryllium-ebbg.dtb] Error 2 Error: arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-common.dtsi:634.16-41 syntax error FATAL ERROR: Unable to parse input tree make[3]: *** [scripts/Makefile.dtbs:132: arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-tianma.dtb] Error 1 make[2]: *** [scripts/Makefile.build:556: arch/arm64/boot/dts/qcom] Error 2 make[2]: Target 'arch/arm64/boot/dts/qcom/sdm845-xiaomi-beryllium-tianma.dtb' not remade because of errors. make[1]: *** [/home/rob/proj/linux-dt-testing/Makefile:1500: qcom/sdm845-xiaomi-beryllium-tianma.dtb] Error 2 make: *** [Makefile:248: __sub-make] Error 2 make: Target 'qcom/apq8096-ifc6640.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-j3ltetw.dtb' not remade because of errors. make: Target 'qcom/msm8998-fxtec-pro1.dtb' not remade because of errors. make: Target 'qcom/sm7325-nothing-spacewar.dtb' not remade because of errors. make: Target 'qcom/x1e80100-asus-zenbook-a14.dtb' not remade because of errors. make: Target 'qcom/sm7125-xiaomi-curtana.dtb' not remade because of errors. make: Target 'qcom/x1e80100-dell-xps13-9345.dtb' not remade because of errors. make: Target 'qcom/msm8998-mtp.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-a5u-eur.dtb' not remade because of errors. make: Target 'qcom/sc8280xp-lenovo-thinkpad-x13s.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r3-lte.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-crd-pro.dtb' not remade because of errors. make: Target 'qcom/sm6115p-lenovo-j606f.dtb' not remade because of errors. make: Target 'qcom/msm8998-sony-xperia-yoshino-maple.dtb' not remade because of errors. make: Target 'qcom/ipq9574-rdp454.dtb' not remade because of errors. make: Target 'qcom/qcs6490-rb3gen2.dtb' not remade because of errors. make: Target 'qcom/msm8992-xiaomi-libra.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-limozeen-r4.dtb' not remade because of errors. make: Target 'qcom/sdm450-motorola-ali.dtb' not remade because of errors. make: Target 'qcom/x1e78100-lenovo-thinkpad-t14s-oled.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-quackingstick-r0.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pazquel360-wifi.dtb' not remade because of errors. make: Target 'qcom/sdm630-sony-xperia-ganges-kirin.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-coachz-r1-lte.dtb' not remade because of errors. make: Target 'qcom/x1e80100-dell-latitude-7455.dtb' not remade because of errors. make: Target 'qcom/sdm845-lg-judyp.dtb' not remade because of errors. make: Target 'qcom/msm8939-wingtech-wt82918.dtb' not remade because of errors. make: Target 'qcom/qrb2210-rb1.dtb' not remade because of errors. make: Target 'qcom/msm8996-mtp.dtb' not remade because of errors. make: Target 'qcom/sm8750-mtp.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-zombie.dtb' not remade because of errors. make: Target 'qcom/msm8992-lg-bullhead-rev-10.dtb' not remade because of errors. make: Target 'qcom/qrb5165-rb5.dtb' not remade because of errors. make: Target 'qcom/x1e80100-lenovo-yoga-slim7x.dtb' not remade because of errors. make: Target 'qcom/sm8550-qrd.dtb' not remade because of errors. make: Target 'qcom/sdm630-sony-xperia-nile-discovery.dtb' not remade because of errors. make: Target 'qcom/sm8550-sony-xperia-yodo-pdx234.dtb' not remade because of errors. make: Target 'qcom/msm8939-huawei-kiwi.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-wormdingler-rev1-inx.dtb' not remade because of errors. make: Target 'qcom/sc8280xp-microsoft-arcata.dtb' not remade because of errors. make: Target 'qcom/sdm845-oneplus-fajita.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-limozeen-nots-r4.dtb' not remade because of errors. make: Target 'qcom/sdm660-xiaomi-lavender.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-coachz-r1.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r10.dtb' not remade because of errors. make: Target 'qcom/msm8939-wingtech-wt82918hd.dtb' not remade because of errors. make: Target 'qcom/ipq6018-cp01-c1.dtb' not remade because of errors. make: Target 'qcom/sm8250-samsung-x1q.dtb' not remade because of errors. make: Target 'qcom/msm8916-motorola-surnia.dtb' not remade because of errors. make: Target 'qcom/sm8350-microsoft-surface-duo2.dtb' not remade because of errors. make: Target 'qcom/qcm6490-idp.dtb' not remade because of errors. make: Target 'qcom/sm8550-mtp.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-a3u-eur.dtb' not remade because of errors. make: Target 'qcom/sdm845-sony-xperia-tama-akari.dtb' not remade because of errors. make: Target 'qcom/x1p42100-crd.dtb' not remade because of errors. make: Target 'qcom/sm8250-mtp.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-limozeen-nots-r9.dtb' not remade because of errors. make: Target 'qcom/sm8250-xiaomi-elish-csot.dtb' not remade because of errors. make: Target 'qcom/msm8916-wingtech-wt88047.dtb' not remade because of errors. make: Target 'qcom/msm8916-thwc-ufi001c.dtb' not remade because of errors. make: Target 'qcom/msm8998-xiaomi-sagit.dtb' not remade because of errors. make: Target 'qcom/qcm6490-particle-tachyon.dtb' not remade because of errors. make: Target 'qcom/qcs8550-aim300-aiot.dtb' not remade because of errors. make: Target 'qcom/sdm450-lenovo-tbx605f.dtb' not remade because of errors. make: Target 'qcom/sm8250-xiaomi-elish-boe.dtb' not remade because of errors. make: Target 'qcom/qcs404-evb-4000.dtb' not remade because of errors. make: Target 'qcom/qcs9100-ride.dtb' not remade because of errors. make: Target 'qcom/msm8996-sony-xperia-tone-kagura.dtb' not remade because of errors. make: Target 'qcom/sm8150-sony-xperia-kumano-griffin.dtb' not remade because of errors. make: Target 'qcom/sdm670-google-sargo.dtb' not remade because of errors. make: Target 'qcom/x1e001de-devkit.dtb' not remade because of errors. make: Target 'qcom/sa8775p-ride.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-crd.dtb' not remade because of errors. make: Target 'qcom/ipq5424-rdp466.dtb' not remade because of errors. make: Target 'qcom/sc8180x-lenovo-flex-5g.dtb' not remade because of errors. make: Target 'qcom/sdm845-lg-judyln.dtb' not remade because of errors. make: Target 'qcom/msm8953-flipkart-rimob.dtb' not remade because of errors. make: Target 'qcom/sm6125-xiaomi-ginkgo.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r3-kb.dtb' not remade because of errors. make: Target 'qcom/msm8916-motorola-osprey.dtb' not remade because of errors. make: Target 'qcom/sm8250-xiaomi-pipa.dtb' not remade because of errors. make: Target 'qcom/sdm845-oneplus-enchilada.dtb' not remade because of errors. make: Target 'qcom/msm8956-sony-xperia-loire-suzu.dtb' not remade because of errors. make: Target 'qcom/msm8937-xiaomi-land.dtb' not remade because of errors. make: Target 'qcom/sc7280-idp.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-evoker-lte.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-homestar-r4.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-rossa.dtb' not remade because of errors. make: Target 'qcom/apq8039-t2.dtb' not remade because of errors. make: Target 'qcom/msm8916-motorola-harpia.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-e5.dtb' not remade because of errors. make: Target 'qcom/sc7280-idp2.dtb' not remade because of errors. make: Target 'qcom/msm8939-sony-xperia-kanuti-tulip.dtb' not remade because of errors. make: Target 'qcom/sm8250-samsung-r8q.dtb' not remade because of errors. make: Target 'qcom/ipq8074-hk01.dtb' not remade because of errors. make: Target 'qcom/sm8150-mtp.dtb' not remade because of errors. make: Target 'qcom/ipq9574-rdp433.dtb' not remade because of errors. make: Target 'qcom/sdm845-sony-xperia-tama-apollo.dtb' not remade because of errors. make: Target 'qcom/msm8998-lenovo-miix-630.dtb' not remade because of errors. make: Target 'qcom/msm8994-sony-xperia-kitakami-karin.dtb' not remade because of errors. make: Target 'qcom/sdm630-sony-xperia-nile-pioneer.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-grandmax.dtb' not remade because of errors. make: Target 'qcom/msm8916-alcatel-idol347.dtb' not remade because of errors. make: Target 'qcom/ipq9574-rdp453.dtb' not remade because of errors. make: Target 'qcom/sc7180-acer-aspire1.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-r1.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-e7.dtb' not remade because of errors. make: Target 'qcom/ipq5018-rdp432-c2.dtb' not remade because of errors. make: Target 'qcom/apq8016-schneider-hmibsc.dtb' not remade because of errors. make: Target 'qcom/qrb4210-rb2.dtb' not remade because of errors. make: Target 'qcom/x1p42100-hp-omnibook-x14.dtb' not remade because of errors. make: Target 'qcom/ipq5018-tplink-archer-ax55-v1.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-evoker.dtb' not remade because of errors. make: Target 'qcom/sdm850-huawei-matebook-e-2019.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-kingoftown.dtb' not remade because of errors. make: Target 'qcom/sm4450-qrd.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-j5.dtb' not remade because of errors. make: Target 'qcom/msm8998-asus-novago-tp370ql.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pompom-r2-lte.dtb' not remade because of errors. make: Target 'qcom/msm8992-lg-h815.dtb' not remade because of errors. make: Target 'qcom/sdx75-idp.dtb' not remade because of errors. make: Target 'qcom/sm8350-sony-xperia-sagami-pdx215.dtb' not remade because of errors. make: Target 'qcom/apq8096-db820c.dtb' not remade because of errors. make: Target 'qcom/msm8996-sony-xperia-tone-keyaki.dtb' not remade because of errors. make: Target 'qcom/msm8916-longcheer-l8150.dtb' not remade because of errors. make: Target 'qcom/msm8994-sony-xperia-kitakami-suzuran.dtb' not remade because of errors. make: Target 'qcom/sdm845-mtp.dtb' not remade because of errors. make: Target 'qcom/sm6375-sony-xperia-murray-pdx225.dtb' not remade because of errors. make: Target 'qcom/msm8916-yiming-uz801v3.dtb' not remade because of errors. make: Target 'qcom/qcs9100-ride-r3.dtb' not remade because of errors. make: Target 'qcom/x1e80100-hp-omnibook-x14.dtb' not remade because of errors. make: Target 'qcom/msm8953-xiaomi-vince.dtb' not remade because of errors. make: Target 'qcom/ipq5332-rdp441.dtb' not remade because of errors. make: Target 'qcom/msm8992-lg-bullhead-rev-101.dtb' not remade because of errors. make: Target 'qcom/msm8917-xiaomi-riva.dtb' not remade because of errors. make: Target 'qcom/msm8996-xiaomi-gemini.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-limozeen-r9.dtb' not remade because of errors. make: Target 'qcom/msm8998-sony-xperia-yoshino-lilac.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-gprimeltecan.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pazquel360-lte.dtb' not remade because of errors. make: Target 'qcom/sdm845-shift-axolotl.dtb' not remade because of errors. make: Target 'qcom/msm8996-oneplus3t.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-zombie-lte.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r3.dtb' not remade because of errors. make: Target 'qcom/monaco-evk.dtb' not remade because of errors. make: Target 'qcom/sar2130p-qar2130p.dtb' not remade because of errors. make: Target 'qcom/sm8650-hdk.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-herobrine-r1.dtb' not remade because of errors. make: Target 'qcom/msm8916-longcheer-l8910.dtb' not remade because of errors. make: Target 'qcom/sdm630-sony-xperia-nile-voyager.dtb' not remade because of errors. make: Target 'qcom/sm8450-hdk.dtb' not remade because of errors. make: Target 'qcom/msm8929-wingtech-wt82918hd.dtb' not remade because of errors. make: Target 'qcom/sm8250-sony-xperia-edo-pdx203.dtb' not remade because of errors. make: Target 'qcom/sm8350-hdk.dtb' not remade because of errors. make: Target 'qcom/ipq8074-hk10-c1.dtb' not remade because of errors. make: Target 'qcom/sm8450-qrd.dtb' not remade because of errors. make: Target 'qcom/msm8916-lg-c50.dtb' not remade because of errors. make: Target 'qcom/sm8250-sony-xperia-edo-pdx206.dtb' not remade because of errors. make: Target 'qcom/sm7225-fairphone-fp4.dtb' not remade because of errors. make: Target 'qcom/sa8155p-adp.dtb' not remade because of errors. make: Target 'qcom/x1e80100-qcp.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r1-kb.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-grandprimelte.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-zombie-nvme-lte.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-homestar-r3.dtb' not remade because of errors. make: Target 'qcom/ipq5332-rdp474.dtb' not remade because of errors. make: Target 'qcom/x1e80100-asus-vivobook-s15.dtb' not remade because of errors. make: Target 'qcom/sm8150-microsoft-surface-duo.dtb' not remade because of errors. make: Target 'qcom/msm8996pro-xiaomi-scorpio.dtb' not remade because of errors. make: Target 'qcom/x1e78100-lenovo-thinkpad-t14s.dtb' not remade because of errors. make: Target 'qcom/sm8150-hdk.dtb' not remade because of errors. make: Target 'qcom/sc8180x-primus.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r10-lte.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-j5x.dtb' not remade because of errors. make: Target 'qcom/x1p42100-asus-zenbook-a14.dtb' not remade because of errors. make: Target 'qcom/sc7180-idp.dtb' not remade because of errors. make: Target 'qcom/msm8916-mtp.dtb' not remade because of errors. make: Target 'qcom/x1e80100-hp-elitebook-ultra-g1q.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-limozeen-r10.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-zombie-nvme.dtb' not remade because of errors. make: Target 'qcom/x1e80100-microsoft-romulus15.dtb' not remade because of errors. make: Target 'qcom/qru1000-idp.dtb' not remade because of errors. make: Target 'qcom/msm8998-hp-envy-x2.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-wormdingler-rev1-boe-rt5682s.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pazquel-parade.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r9-kb.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-wormdingler-rev1-boe.dtb' not remade because of errors. make: Target 'qcom/qcs615-ride.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-coachz-r3-lte.dtb' not remade because of errors. make: Target 'qcom/sc7280-crd-r3.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-gt58.dtb' not remade because of errors. make: Target 'qcom/sa8775p-ride-r3.dtb' not remade because of errors. make: Target 'qcom/sm8450-samsung-r0q.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-villager-r1.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pazquel-ti.dtb' not remade because of errors. make: Target 'qcom/qcm6490-shift-otter.dtb' not remade because of errors. make: Target 'qcom/hamoa-iot-evk.dtb' not remade because of errors. make: Target 'qcom/qcs8300-ride.dtb' not remade because of errors. make: Target 'qcom/apq8016-sbc.dtb' not remade because of errors. make: Target 'qcom/msm8996pro-xiaomi-natrium.dtb' not remade because of errors. make: Target 'qcom/sdm845-samsung-starqltechn.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pompom-r1-lte.dtb' not remade because of errors. make: Target 'qcom/msm8953-xiaomi-tissot.dtb' not remade because of errors. make: Target 'qcom/x1e80100-dell-inspiron-14-plus-7441.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r9.dtb' not remade because of errors. make: Target 'qcom/sm6125-xiaomi-laurel-sprout.dtb' not remade because of errors. make: Target 'qcom/msm8994-sony-xperia-kitakami-sumire.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-serranove.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-coachz-r3.dtb' not remade because of errors. make: Target 'qcom/sdm845-sony-xperia-tama-akatsuki.dtb' not remade because of errors. make: Target 'qcom/ipq9574-rdp449.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-r1-lte.dtb' not remade because of errors. make: Target 'qcom/msm8916-lg-m216.dtb' not remade because of errors. make: Target 'qcom/msm8939-asus-z00t.dtb' not remade because of errors. make: Target 'qcom/lemans-evk.dtb' not remade because of errors. make: Target 'qcom/x1e80100-crd.dtb' not remade because of errors. make: Target 'qcom/x1p42100-lenovo-thinkbook-16.dtb' not remade because of errors. make: Target 'qcom/apq8094-sony-xperia-kitakami-karin_windy.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r9-lte.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pazquel-lte-ti.dtb' not remade because of errors. make: Target 'qcom/msm8996-sony-xperia-tone-dora.dtb' not remade because of errors. make: Target 'qcom/sa8295p-adp.dtb' not remade because of errors. make: Target 'qcom/msm8994-sony-xperia-kitakami-ivy.dtb' not remade because of errors. make: Target 'qcom/sdm845-xiaomi-beryllium-ebbg.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pompom-r3.dtb' not remade because of errors. make: Target 'qcom/msm8998-oneplus-dumpling.dtb' not remade because of errors. make: Target 'qcom/sm8650-mtp.dtb' not remade because of errors. make: Target 'qcom/msm8996-oneplus3.dtb' not remade because of errors. make: Target 'qcom/sm8550-hdk.dtb' not remade because of errors. make: Target 'qcom/x1e80100-microsoft-romulus13.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r1-lte.dtb' not remade because of errors. make: Target 'qcom/msm8939-samsung-a7.dtb' not remade because of errors. make: Target 'qcom/qcm6490-fairphone-fp5.dtb' not remade because of errors. make: Target 'qcom/sc8280xp-huawei-gaokun3.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-wormdingler-rev1-inx-rt5682s.dtb' not remade because of errors. make: Target 'qcom/msm8953-xiaomi-mido.dtb' not remade because of errors. make: Target 'qcom/msm8916-asus-z00l.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pompom-r2.dtb' not remade because of errors. make: Target 'qcom/sm6350-sony-xperia-lena-pdx213.dtb' not remade because of errors. make: Target 'qcom/sdm632-fairphone-fp3.dtb' not remade because of errors. make: Target 'qcom/x1p42100-asus-zenbook-a14-lcd.dtb' not remade because of errors. make: Target 'qcom/msm8953-motorola-potter.dtb' not remade because of errors. make: Target 'qcom/sda660-inforce-ifc6560.dtb' not remade because of errors. make: Target 'qcom/sm8150-sony-xperia-kumano-bahamut.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pazquel-lte-parade.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-homestar-r2.dtb' not remade because of errors. make: Target 'qcom/sm8250-hdk.dtb' not remade because of errors. make: Target 'qcom/sm8650-qrd.dtb' not remade because of errors. make: Target 'qcom/sc8280xp-microsoft-blackrock.dtb' not remade because of errors. make: Target 'qcom/ipq8074-hk10-c2.dtb' not remade because of errors. make: Target 'qcom/msm8953-xiaomi-daisy.dtb' not remade because of errors. make: Target 'qcom/sc8280xp-crd.dtb' not remade because of errors. make: Target 'qcom/sdm850-samsung-w737.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-limozeen-nots-r5.dtb' not remade because of errors. make: Target 'qcom/msm8916-samsung-gt510.dtb' not remade because of errors. make: Target 'qcom/sdm850-lenovo-yoga-c630.dtb' not remade because of errors. make: Target 'qcom/msm8916-thwc-uf896.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r10-kb.dtb' not remade because of errors. make: Target 'qcom/msm8994-sony-xperia-kitakami-satsuki.dtb' not remade because of errors. make: Target 'qcom/sdm632-motorola-ocean.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-villager-r1-lte.dtb' not remade because of errors. make: Target 'qcom/sm6115-fxtec-pro1x.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pompom-r3-lte.dtb' not remade because of errors. make: Target 'qcom/msm8998-sony-xperia-yoshino-poplar.dtb' not remade because of errors. make: Target 'qcom/qcs6490-radxa-dragon-q6a.dtb' not remade because of errors. make: Target 'qcom/msm8916-huawei-g7.dtb' not remade because of errors. make: Target 'qcom/msm8916-wingtech-wt86518.dtb' not remade because of errors. make: Target 'qcom/sm8350-sony-xperia-sagami-pdx214.dtb' not remade because of errors. make: Target 'qcom/msm8916-wingtech-wt86528.dtb' not remade because of errors. make: Target 'qcom/sdm845-db845c.dtb' not remade because of errors. make: Target 'qcom/sa8540p-ride.dtb' not remade because of errors. make: Target 'qcom/msm8939-longcheer-l9100.dtb' not remade because of errors. make: Target 'qcom/qdu1000-idp.dtb' not remade because of errors. make: Target 'qcom/sm8550-samsung-q5q.dtb' not remade because of errors. make: Target 'qcom/msm8992-msft-lumia-octagon-talkman.dtb' not remade because of errors. make: Target 'qcom/msm8916-gplus-fl8005a.dtb' not remade because of errors. make: Target 'qcom/sm8350-mtp.dtb' not remade because of errors. make: Target 'qcom/msm8956-sony-xperia-loire-kugo.dtb' not remade because of errors. make: Target 'qcom/msm8976-longcheer-l9360.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-r1.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-pompom-r1.dtb' not remade because of errors. make: Target 'qcom/msm8998-oneplus-cheeseburger.dtb' not remade because of errors. make: Target 'qcom/sc7280-herobrine-villager-r0.dtb' not remade because of errors. make: Target 'qcom/sm8750-qrd.dtb' not remade because of errors. make: Target 'qcom/sm4250-oneplus-billie2.dtb' not remade because of errors. make: Target 'qcom/sdm636-sony-xperia-ganges-mermaid.dtb' not remade because of errors. make: Target 'qcom/qcs404-evb-1000.dtb' not remade because of errors. make: Target 'qcom/ipq5332-rdp442.dtb' not remade because of errors. make: Target 'qcom/msm8994-msft-lumia-octagon-cityman.dtb' not remade because of errors. make: Target 'qcom/msm8916-acer-a1-724.dtb' not remade because of errors. make: Target 'qcom/sdm845-xiaomi-beryllium-tianma.dtb' not remade because of errors. make: Target 'qcom/sm6125-sony-xperia-seine-pdx201.dtb' not remade because of errors. make: Target 'qcom/sdm845-xiaomi-polaris.dtb' not remade because of errors. make: Target 'qcom/ipq9574-rdp418.dtb' not remade because of errors. make: Target 'qcom/msm8216-samsung-fortuna3g.dtb' not remade because of errors. make: Target 'qcom/sm8450-sony-xperia-nagara-pdx223.dtb' not remade because of errors. make: Target 'qcom/sm8450-sony-xperia-nagara-pdx224.dtb' not remade because of errors. make: Target 'qcom/sm7125-xiaomi-joyeuse.dtb' not remade because of errors. make: Target 'qcom/msm8994-huawei-angler-rev-101.dtb' not remade because of errors. make: Target 'qcom/ipq5332-rdp468.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-lazor-limozeen-nots-r10.dtb' not remade because of errors. make: Target 'qcom/sc7180-trogdor-quackingstick-r0-lte.dtb' not remade because of errors. From jeff.johnson at oss.qualcomm.com Wed Nov 12 06:58:04 2025 From: jeff.johnson at oss.qualcomm.com (Jeff Johnson) Date: Wed, 12 Nov 2025 06:58:04 -0800 Subject: pull-request: ath-next-20251111 In-Reply-To: <7d445736914f971bfbf89b3480cd6552434eaf7f.camel@sipsolutions.net> References: <15a98cae-0274-45f4-9b8e-be6fa9720884@oss.qualcomm.com> <7d445736914f971bfbf89b3480cd6552434eaf7f.camel@sipsolutions.net> Message-ID: On 11/12/2025 3:51 AM, Johannes Berg wrote: > On Tue, 2025-11-11 at 09:28 -0800, Jeff Johnson wrote: >> >> Once pulled into wireless-next, ath-next will fast-forward, and that >> will provide the baseline for merging ath12k-ng into ath-next. > > I just sent a PR to net-next with this, so you might want to wait until > that's merged and I pull net-next back, to have all the current net > content as well. Will do, thanks! /jeff From davidwronek at gmail.com Wed Nov 12 09:35:29 2025 From: davidwronek at gmail.com (David Wronek) Date: Wed, 12 Nov 2025 18:35:29 +0100 Subject: SPF Test (ignore me) Message-ID: Hello, just testing if I can reach this mailing list. Best regards David From W_Armin at gmx.de Thu Nov 13 19:23:01 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Fri, 14 Nov 2025 04:23:01 +0100 Subject: [PATCH RFC 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices Message-ID: <20251114-thermal-device-v1-0-d8b442aae38b@gmx.de> Drivers registering thermal zone/cooling devices are currently unable to tell the thermal core what parent device the new thermal zone/ cooling device should have, potentially causing issues with suspend ordering and making it impossible for user space appications to associate a given thermal zone device with its parent device. This patch series aims to fix this issue by extending the functions used to register thermal zone/cooling devices to also accept a parent device pointer. The first six patches convert all functions used for registering cooling devices, while the functions used for registering thermal zone devices are converted by the remaining two patches. I tested this series on various devices containing (among others): - ACPI thermal zones - ACPI processor devices - PCIe cooling devices - Intel Wifi card - Intel powerclamp - Intel TCC cooling I also compile-tested the remaining affected drivers, however i would still be happy if the relevant maintainers (especially those of the mellanox ethernet switch driver) could take a quick glance at the code and verify that i am using the correct device as the parent device. This work is also necessary for extending the ACPI thermal zone driver to support the _TZD ACPI object in the future. Signed-off-by: Armin Wolf --- Armin Wolf (8): thermal: core: Allow setting the parent device of cooling devices thermal: core: Set parent device in thermal_of_cooling_device_register() ACPI: processor: Stop creating "device" sysfs link ACPI: fan: Stop creating "device" sysfs link ACPI: video: Stop creating "device" sysfs link thermal: core: Set parent device in thermal_cooling_device_register() ACPI: thermal: Stop creating "device" sysfs link thermal: core: Allow setting the parent device of thermal zone devices Documentation/driver-api/thermal/sysfs-api.rst | 10 ++++- drivers/acpi/acpi_video.c | 9 +---- drivers/acpi/fan_core.c | 16 ++------ drivers/acpi/processor_thermal.c | 15 +------ drivers/acpi/thermal.c | 33 ++++++--------- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c | 4 +- drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 47 +++++++++++----------- drivers/net/wireless/ath/ath10k/thermal.c | 2 +- drivers/net/wireless/ath/ath11k/thermal.c | 2 +- drivers/net/wireless/intel/iwlwifi/mld/thermal.c | 6 +-- drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 12 +++--- drivers/net/wireless/mediatek/mt76/mt7915/init.c | 2 +- drivers/net/wireless/mediatek/mt76/mt7996/init.c | 2 +- drivers/platform/x86/acerhdf.c | 4 +- drivers/power/supply/power_supply_core.c | 4 +- drivers/thermal/armada_thermal.c | 2 +- drivers/thermal/cpufreq_cooling.c | 2 +- drivers/thermal/cpuidle_cooling.c | 2 +- drivers/thermal/da9062-thermal.c | 2 +- drivers/thermal/devfreq_cooling.c | 2 +- drivers/thermal/dove_thermal.c | 2 +- drivers/thermal/imx_thermal.c | 2 +- .../intel/int340x_thermal/int3400_thermal.c | 2 +- .../intel/int340x_thermal/int3403_thermal.c | 4 +- .../intel/int340x_thermal/int3406_thermal.c | 2 +- .../intel/int340x_thermal/int340x_thermal_zone.c | 13 +++--- .../int340x_thermal/processor_thermal_device_pci.c | 7 ++-- drivers/thermal/intel/intel_pch_thermal.c | 2 +- drivers/thermal/intel/intel_powerclamp.c | 2 +- drivers/thermal/intel/intel_quark_dts_thermal.c | 2 +- drivers/thermal/intel/intel_soc_dts_iosf.c | 2 +- drivers/thermal/intel/intel_tcc_cooling.c | 2 +- drivers/thermal/intel/x86_pkg_temp_thermal.c | 6 +-- drivers/thermal/kirkwood_thermal.c | 2 +- drivers/thermal/pcie_cooling.c | 2 +- drivers/thermal/renesas/rcar_thermal.c | 10 +++-- drivers/thermal/spear_thermal.c | 2 +- drivers/thermal/tegra/soctherm.c | 5 +-- drivers/thermal/testing/zone.c | 2 +- drivers/thermal/thermal_core.c | 23 +++++++---- drivers/thermal/thermal_of.c | 9 +++-- include/linux/thermal.h | 22 +++++----- 43 files changed, 145 insertions(+), 162 deletions(-) --- base-commit: 399fb812cd1532773e6aa985c0949859221341c4 change-id: 20251114-thermal-device-655d138824c6 Best regards, -- Armin Wolf From W_Armin at gmx.de Thu Nov 13 19:23:02 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Fri, 14 Nov 2025 04:23:02 +0100 Subject: [PATCH RFC 1/8] thermal: core: Allow setting the parent device of cooling devices In-Reply-To: <20251114-thermal-device-v1-0-d8b442aae38b@gmx.de> References: <20251114-thermal-device-v1-0-d8b442aae38b@gmx.de> Message-ID: <20251114-thermal-device-v1-1-d8b442aae38b@gmx.de> Currently, cooling devices have no parent device, potentially causing issues with suspend ordering and making it impossible for consumers (thermal zones and userspace appications) to associate a given cooling device with its parent device. Extend __thermal_cooling_device_register() to also accept a parent device pointer. For now only devm_thermal_of_cooling_device_register() uses this, as the other wrapper functions need to be extended first. Signed-off-by: Armin Wolf --- drivers/thermal/thermal_core.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 17ca5c082643..c8b720194b44 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1040,6 +1040,7 @@ static void thermal_cooling_device_init_complete(struct thermal_cooling_device * /** * __thermal_cooling_device_register() - register a new thermal cooling device + * @parent: parent device pointer. * @np: a pointer to a device tree node. * @type: the thermal cooling device type. * @devdata: device private data. @@ -1055,7 +1056,7 @@ static void thermal_cooling_device_init_complete(struct thermal_cooling_device * * ERR_PTR. Caller must check return value with IS_ERR*() helpers. */ static struct thermal_cooling_device * -__thermal_cooling_device_register(struct device_node *np, +__thermal_cooling_device_register(struct device *parent, struct device_node *np, const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { @@ -1092,6 +1093,7 @@ __thermal_cooling_device_register(struct device_node *np, cdev->ops = ops; cdev->updated = false; cdev->device.class = thermal_class; + cdev->device.parent = parent; cdev->devdata = devdata; ret = cdev->ops->get_max_state(cdev, &cdev->max_state); @@ -1158,7 +1160,7 @@ struct thermal_cooling_device * thermal_cooling_device_register(const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { - return __thermal_cooling_device_register(NULL, type, devdata, ops); + return __thermal_cooling_device_register(NULL, NULL, type, devdata, ops); } EXPORT_SYMBOL_GPL(thermal_cooling_device_register); @@ -1182,7 +1184,7 @@ thermal_of_cooling_device_register(struct device_node *np, const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { - return __thermal_cooling_device_register(np, type, devdata, ops); + return __thermal_cooling_device_register(NULL, np, type, devdata, ops); } EXPORT_SYMBOL_GPL(thermal_of_cooling_device_register); @@ -1222,7 +1224,7 @@ devm_thermal_of_cooling_device_register(struct device *dev, if (!ptr) return ERR_PTR(-ENOMEM); - tcd = __thermal_cooling_device_register(np, type, devdata, ops); + tcd = __thermal_cooling_device_register(dev, np, type, devdata, ops); if (IS_ERR(tcd)) { devres_free(ptr); return tcd; -- 2.39.5 From W_Armin at gmx.de Thu Nov 13 19:23:03 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Fri, 14 Nov 2025 04:23:03 +0100 Subject: [PATCH RFC 2/8] thermal: core: Set parent device in thermal_of_cooling_device_register() In-Reply-To: <20251114-thermal-device-v1-0-d8b442aae38b@gmx.de> References: <20251114-thermal-device-v1-0-d8b442aae38b@gmx.de> Message-ID: <20251114-thermal-device-v1-2-d8b442aae38b@gmx.de> Extend thermal_of_cooling_device_register() to allow users to specify the parent device of the cooling device to be created. Signed-off-by: Armin Wolf --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 ++-- drivers/thermal/cpufreq_cooling.c | 2 +- drivers/thermal/cpuidle_cooling.c | 2 +- drivers/thermal/devfreq_cooling.c | 2 +- drivers/thermal/tegra/soctherm.c | 5 ++--- drivers/thermal/thermal_core.c | 5 +++-- include/linux/thermal.h | 9 ++++----- 7 files changed, 14 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index cf0d9049bcf1..f2c98e46a1c6 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -1778,8 +1778,8 @@ static int etnaviv_gpu_bind(struct device *dev, struct device *master, int ret; if (IS_ENABLED(CONFIG_DRM_ETNAVIV_THERMAL)) { - gpu->cooling = thermal_of_cooling_device_register(dev->of_node, - (char *)dev_name(dev), gpu, &cooling_ops); + gpu->cooling = thermal_of_cooling_device_register(dev, dev->of_node, dev_name(dev), + gpu, &cooling_ops); if (IS_ERR(gpu->cooling)) return PTR_ERR(gpu->cooling); } diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c index 6b7ab1814c12..af9250c44da7 100644 --- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -593,7 +593,7 @@ __cpufreq_cooling_register(struct device_node *np, if (!name) goto remove_qos_req; - cdev = thermal_of_cooling_device_register(np, name, cpufreq_cdev, + cdev = thermal_of_cooling_device_register(dev, np, name, cpufreq_cdev, cooling_ops); kfree(name); diff --git a/drivers/thermal/cpuidle_cooling.c b/drivers/thermal/cpuidle_cooling.c index f678c1281862..520c89a36d90 100644 --- a/drivers/thermal/cpuidle_cooling.c +++ b/drivers/thermal/cpuidle_cooling.c @@ -207,7 +207,7 @@ static int __cpuidle_cooling_register(struct device_node *np, goto out_unregister; } - cdev = thermal_of_cooling_device_register(np, name, idle_cdev, + cdev = thermal_of_cooling_device_register(dev, np, name, idle_cdev, &cpuidle_cooling_ops); if (IS_ERR(cdev)) { ret = PTR_ERR(cdev); diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c index 8fd7cf1932cd..d91695ed0f26 100644 --- a/drivers/thermal/devfreq_cooling.c +++ b/drivers/thermal/devfreq_cooling.c @@ -454,7 +454,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df, if (!name) goto remove_qos_req; - cdev = thermal_of_cooling_device_register(np, name, dfc, ops); + cdev = thermal_of_cooling_device_register(dev, np, name, dfc, ops); kfree(name); if (IS_ERR(cdev)) { diff --git a/drivers/thermal/tegra/soctherm.c b/drivers/thermal/tegra/soctherm.c index 5d26b52beaba..4f43da123be4 100644 --- a/drivers/thermal/tegra/soctherm.c +++ b/drivers/thermal/tegra/soctherm.c @@ -1700,9 +1700,8 @@ static void soctherm_init_hw_throt_cdev(struct platform_device *pdev) stc->init = true; } else { - tcd = thermal_of_cooling_device_register(np_stcc, - (char *)name, ts, - &throt_cooling_ops); + tcd = thermal_of_cooling_device_register(dev, np_stcc, name, ts, + &throt_cooling_ops); if (IS_ERR_OR_NULL(tcd)) { dev_err(dev, "throttle-cfg: %s: failed to register cooling device\n", diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index c8b720194b44..5d752e712cc0 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1166,6 +1166,7 @@ EXPORT_SYMBOL_GPL(thermal_cooling_device_register); /** * thermal_of_cooling_device_register() - register an OF thermal cooling device + * @parent: parent device pointer. * @np: a pointer to a device tree node. * @type: the thermal cooling device type. * @devdata: device private data. @@ -1180,11 +1181,11 @@ EXPORT_SYMBOL_GPL(thermal_cooling_device_register); * ERR_PTR. Caller must check return value with IS_ERR*() helpers. */ struct thermal_cooling_device * -thermal_of_cooling_device_register(struct device_node *np, +thermal_of_cooling_device_register(struct device *parent, struct device_node *np, const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { - return __thermal_cooling_device_register(NULL, np, type, devdata, ops); + return __thermal_cooling_device_register(parent, np, type, devdata, ops); } EXPORT_SYMBOL_GPL(thermal_of_cooling_device_register); diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 0b5ed6821080..fa53d12173ce 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -253,8 +253,8 @@ void thermal_zone_device_update(struct thermal_zone_device *, struct thermal_cooling_device *thermal_cooling_device_register(const char *, void *, const struct thermal_cooling_device_ops *); struct thermal_cooling_device * -thermal_of_cooling_device_register(struct device_node *np, const char *, void *, - const struct thermal_cooling_device_ops *); +thermal_of_cooling_device_register(struct device *parent, struct device_node *np, const char *type, + void *devdata, const struct thermal_cooling_device_ops *); struct thermal_cooling_device * devm_thermal_of_cooling_device_register(struct device *dev, struct device_node *np, @@ -302,9 +302,8 @@ thermal_cooling_device_register(const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { return ERR_PTR(-ENODEV); } static inline struct thermal_cooling_device * -thermal_of_cooling_device_register(struct device_node *np, - const char *type, void *devdata, - const struct thermal_cooling_device_ops *ops) +thermal_of_cooling_device_register(struct device *parent, struct device_node *np, const char *type, + void *devdata, const struct thermal_cooling_device_ops *ops) { return ERR_PTR(-ENODEV); } static inline struct thermal_cooling_device * devm_thermal_of_cooling_device_register(struct device *dev, -- 2.39.5 From W_Armin at gmx.de Thu Nov 13 19:23:04 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Fri, 14 Nov 2025 04:23:04 +0100 Subject: [PATCH RFC 3/8] ACPI: processor: Stop creating "device" sysfs link In-Reply-To: <20251114-thermal-device-v1-0-d8b442aae38b@gmx.de> References: <20251114-thermal-device-v1-0-d8b442aae38b@gmx.de> Message-ID: <20251114-thermal-device-v1-3-d8b442aae38b@gmx.de> The thermal core will soon automatically create sysfs links between the cooling device and its parent device. Stop manually creating the "device" sysfs link between the cooling device and the parent device to avoid a name collision. The "thermal_cooling" sysfs link however stays for backwards compatibility, as it does not suffer from a name collision. Signed-off-by: Armin Wolf --- drivers/acpi/processor_thermal.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/drivers/acpi/processor_thermal.c b/drivers/acpi/processor_thermal.c index c7b1dc5687ec..1ff10321eac5 100644 --- a/drivers/acpi/processor_thermal.c +++ b/drivers/acpi/processor_thermal.c @@ -323,6 +323,7 @@ int acpi_processor_thermal_init(struct acpi_processor *pr, dev_dbg(&device->dev, "registered as cooling_device%d\n", pr->cdev->id); + /* For backwards compatibility */ result = sysfs_create_link(&device->dev.kobj, &pr->cdev->device.kobj, "thermal_cooling"); @@ -332,19 +333,8 @@ int acpi_processor_thermal_init(struct acpi_processor *pr, goto err_thermal_unregister; } - result = sysfs_create_link(&pr->cdev->device.kobj, - &device->dev.kobj, - "device"); - if (result) { - dev_err(&pr->cdev->device, - "Failed to create sysfs link 'device'\n"); - goto err_remove_sysfs_thermal; - } - return 0; -err_remove_sysfs_thermal: - sysfs_remove_link(&device->dev.kobj, "thermal_cooling"); err_thermal_unregister: thermal_cooling_device_unregister(pr->cdev); @@ -356,7 +346,6 @@ void acpi_processor_thermal_exit(struct acpi_processor *pr, { if (pr->cdev) { sysfs_remove_link(&device->dev.kobj, "thermal_cooling"); - sysfs_remove_link(&pr->cdev->device.kobj, "device"); thermal_cooling_device_unregister(pr->cdev); pr->cdev = NULL; } -- 2.39.5 From manivannan.sadhasivam at oss.qualcomm.com Fri Nov 14 02:22:18 2025 From: manivannan.sadhasivam at oss.qualcomm.com (Manivannan Sadhasivam) Date: Fri, 14 Nov 2025 15:52:18 +0530 Subject: [PATCH 0/2] wifi: ath: Use static calibration variant table for devicetree platforms Message-ID: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> Hi, This series aims to deprecate the usage of "qcom,*calibration-variant" devicetree property to select the calibration variant for the WLAN devices. This is necessary for WLAN devices connected using PCI bus, as hardcoding the device specific information in PCI devicetree node causes the node to be updated every time when a new device variant is attached to the PCI slot. This approach is not scalable and causes bad user experience. So to avoid relying on the "qcom,*calibration-variant" property, this series introduces a new static calibration variant table based lookup. The newly introduced helper, ath_get_calib_variant() will parse the model name from devicetree and use it to do the variant lookup during runtime. The ath_calib_variant_table[] will hold all the model and calibration variant entries for the supported devices. Going forward, new entries will be added to this table to support calibration variants. Signed-off-by: Manivannan Sadhasivam --- Manivannan Sadhasivam (2): wifi: ath: Use static calibration variant table for devicetree platforms dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property .../bindings/net/wireless/qcom,ath10k.yaml | 1 + .../bindings/net/wireless/qcom,ath11k-pci.yaml | 3 +- .../bindings/net/wireless/qcom,ath11k.yaml | 1 + .../bindings/net/wireless/qcom,ath12k-wsi.yaml | 6 +- .../bindings/net/wireless/qcom,ipq5332-wifi.yaml | 2 +- drivers/net/wireless/ath/ath.h | 98 ++++++++++++++++++++++ drivers/net/wireless/ath/ath10k/core.c | 5 ++ drivers/net/wireless/ath/ath11k/core.c | 7 ++ 8 files changed, 115 insertions(+), 8 deletions(-) --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20251114-ath-variant-tbl-22865456a527 Best regards, -- Manivannan Sadhasivam From manivannan.sadhasivam at oss.qualcomm.com Fri Nov 14 02:22:19 2025 From: manivannan.sadhasivam at oss.qualcomm.com (Manivannan Sadhasivam) Date: Fri, 14 Nov 2025 15:52:19 +0530 Subject: [PATCH 1/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> Message-ID: <20251114-ath-variant-tbl-v1-1-a9adfc49e3f3@oss.qualcomm.com> On devicetree platforms, ath{10k/11k} drivers rely on the presence of the 'qcom,*calibration-variant' property to select the correct calibration data for device variants with colliding IDs. But this property based selection has its own downside that it needs to be added to the devicetree node of the WLAN device, especially for PCI based devices. Currently, the users/vendors are forced to hardcode this property in the PCI device node. If a different device need to be attached to the slot, then the devicetree node also has to be changed. This approach is not scalable and creates a bad user experience. To get rid of this requirement, this commit introduces a static calibration variant table ath_calib_variant_table[], consisting of the platform model and the calibration variant for all upstream supported devices. The entries of this table are derived from the upstream DTS files. The newly introduced helper, ath_get_calib_variant() will parse the model name from devicetree and use it to do the variant lookup during runtime. If the platform model name doesn't match, it will fallback to the devicetree property based lookup. Going forward, the devicetree based lookup will be deprecated and this table will be used exclusively for devices connected to the devicetree based host platforms. Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.2.0.c2-00204-QCAMSLSWPLZ-1 Signed-off-by: Manivannan Sadhasivam --- drivers/net/wireless/ath/ath.h | 98 ++++++++++++++++++++++++++++++++++ drivers/net/wireless/ath/ath10k/core.c | 5 ++ drivers/net/wireless/ath/ath11k/core.c | 7 +++ 3 files changed, 110 insertions(+) diff --git a/drivers/net/wireless/ath/ath.h b/drivers/net/wireless/ath/ath.h index 34654f710d8a1e63f65a47d4602e2035262a4d9e..d0a12151b7fc13355161c48ba1fb200e4617ed11 100644 --- a/drivers/net/wireless/ath/ath.h +++ b/drivers/net/wireless/ath/ath.h @@ -21,6 +21,7 @@ #include #include #include +#include #include /* @@ -336,4 +337,101 @@ static inline const char *ath_bus_type_to_string(enum ath_bus_type bustype) return ath_bus_type_strings[bustype]; } +static const struct __ath_calib_variant_table { + const char *machine; + const char *variant; +} ath_calib_variant_table[] = { + { "ALFA Network AP120C-AC", "ALFA-Network-AP120C-AC" }, + { "8devices Jalapeno", "8devices-Jalapeno" }, + { "Google cozmo board", "GO_COZMO" }, + { "Google damu board", "GO_DAMU" }, + { "Google fennel sku1 board", "GO_FENNEL" }, + { "Google fennel sku6 board", "GO_FENNEL" }, + { "Google fennel sku7 board", "GO_FENNEL" }, + { "Google fennel14 sku2 board", "GO_FENNEL14" }, + { "Google fennel14 sku0 board", "GO_FENNEL14" }, + { "Google juniper sku16 board", "GO_JUNIPER" }, + { "Google makomo sku0 board", "GO_FENNEL14" }, + { "Google makomo sku1 board", "GO_FENNEL14" }, + { "MediaTek kakadu board sku22", "GO_KAKADU" }, + { "MediaTek kakadu board", "GO_KAKADU" }, + { "Google katsu board", "GO_KATSU" }, + { "Google katsu sku38 board", "GO_KATSU" }, + { "MediaTek kodama sku16 board", "GO_KODAMA" }, + { "MediaTek kodama sku272 board", "GO_KODAMA" }, + { "MediaTek kodama sku288 board", "GO_KODAMA" }, + { "MediaTek kodama sku32 board", "GO_KODAMA" }, + { "MediaTek krane sku0 board", "LE_Krane" }, + { "MediaTek krane sku176 board", "LE_Krane" }, + { "Qualcomm Technologies, Inc. Lemans Ride Rev3", "QC_SA8775P_Ride" }, + { "Qualcomm Technologies, Inc. Lemans Ride", "QC_SA8775P_Ride" }, + { "Qualcomm SA8775P Ride Rev3", "QC_SA8775P_Ride" }, + { "Qualcomm SA8775P Ride", "QC_SA8775P_Ride" }, + { "Lenovo Miix 630", "Lenovo_Miix630" }, + { "Fairphone 5", "Fairphone_5" }, + { "Qualcomm Technologies, Inc. QCM6490 IDP", "Qualcomm_qcm6490idp" }, + { "SHIFT SHIFTphone 8", "SHIFTphone_8" }, + { "Qualcomm Technologies, Inc. QCS615 Ride", "QC_QCS615_Ride" }, + { "Qualcomm Technologies, Inc. Robotics RB3gen2", "Qualcomm_rb3gen2" }, + { "Qualcomm Technologies, Inc. Robotics RB1", "Thundercomm_RB1" }, + { "Qualcomm Technologies, Inc. QRB4210 RB2", "Thundercomm_RB2" }, + { "Google Homestar (rev2)", "GO_HOMESTAR" }, + { "Google Homestar (rev3)", "GO_HOMESTAR" }, + { "Google Homestar (rev4+)", "GO_HOMESTAR" }, + { "Google Kingoftown", "GO_KINGOFTOWN" }, + { "Google Lazor Limozeen without Touchscreen (rev10+)", "GO_LAZOR" }, + { "Google Lazor Limozeen without Touchscreen (rev5 - rev8)", "GO_LAZOR" }, + { "Google Lazor Limozeen without Touchscreen (rev9)", "GO_LAZOR" }, + { "Google Lazor Limozeen (rev10+)", "GO_LAZOR" }, + { "Google Lazor Limozeen (rev4 - rev8)", "GO_LAZOR" }, + { "Google Lazor Limozeen (rev9)", "GO_LAZOR" }, + { "Google Lazor (rev1 - 2)", "GO_LAZOR" }, + { "Google Lazor (rev10+) with KB Backlight", "GO_LAZOR" }, + { "Google Lazor (rev10+) with LTE", "GO_LAZOR" }, + { "Google Lazor (rev10+)", "GO_LAZOR" }, + { "Google Lazor (rev3 - 8) with KB Backlight", "GO_LAZOR" }, + { "Google Lazor (rev3 - 8) with LTE", "GO_LAZOR" }, + { "Google Lazor (rev3 - 8)", "GO_LAZOR" }, + { "Google Lazor (rev9) with KB Backlight", "GO_LAZOR" }, + { "Google Lazor (rev9) with LTE", "GO_LAZOR" }, + { "Google Lazor (rev9)", "GO_LAZOR" }, + { "Google Pazquel (Parade,LTE)", "GO_PAZQUEL360" }, + { "Google Pazquel (Parade,WIFI-only)", "GO_PAZQUEL360" }, + { "Google Pompom (rev1)", "GO_POMPOM" }, + { "Google Pompom (rev2)", "GO_POMPOM" }, + { "Google Pompom (rev3+)", "GO_POMPOM" }, + { "Google Wormdingler rev1+ (BOE, rt5682s)", "GO_WORMDINGLER" }, + { "Google Wormdingler rev1+ BOE panel board", "GO_WORMDINGLER" }, + { "Google Wormdingler rev1+ (INX, rt5682s)", "GO_WORMDINGLER" }, + { "Google Wormdingler rev1+ INX panel board", "GO_WORMDINGLER" }, + { "Qualcomm SC8280XP CRD", "QC_8280XP_CRD" }, + { "Lenovo ThinkPad X13s", "LE_X13S" }, + { "Microsoft Surface Pro 9 5G", "MS_SP9_5G" }, + { "Windows Dev Kit 2023", "MS_Volterra" }, + { "Inforce 6560 Single Board Computer", "Inforce_IFC6560" }, + { "Thundercomm Dragonboard 845c", "Thundercomm_DB845C" }, + { "Qualcomm Technologies, Inc. SDM845 MTP", "Qualcomm_sdm845mtp" }, + { "Lenovo Yoga C630", "Lenovo_C630" }, + { "F(x)tec Pro1X (QX1050)", "Fxtec_QX1050" }, + { "Lenovo Tab P11", "Lenovo_P11" }, + { "Qualcomm Technologies, Inc. SM8150 HDK", "Qualcomm_sm8150hdk" }, + { "Xiaomi Mi Pad 5 Pro (BOE)", "Xiaomi_Pad_5Pro" }, + { "Xiaomi Mi Pad 5 Pro (CSOT)", "Xiaomi_Pad_5Pro" }, + { "ASUS Zenbook A14 (UX3407QA)", "UX3407Q" }, + { "Google Scarlet", "GO_DUMO" }, + { /* Sentinel */ } +}; + +static inline const char *ath_get_calib_variant(void) +{ + const struct __ath_calib_variant_table *entry = ath_calib_variant_table; + struct device_node *root __free(device_node) = of_find_node_by_path("/"); + const char *model = of_get_property(root, "model", NULL); + + while ((entry->machine) && strcmp(entry->machine, model)) + entry++; + + return entry->machine ? entry->variant : NULL; +} + #endif /* ATH_H */ diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index 6f78f1752cd6ffcf8eb56621ba0e4978ac23e696..099b8592d50bfac37c54dee7b0aa660ac126a410 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -1161,6 +1161,10 @@ int ath10k_core_check_dt(struct ath10k *ar) struct device_node *node; const char *variant = NULL; + variant = ath_get_calib_variant(); + if (variant) + goto copy_variant; + node = ar->dev->of_node; if (!node) return -ENOENT; @@ -1173,6 +1177,7 @@ int ath10k_core_check_dt(struct ath10k *ar) if (!variant) return -ENODATA; +copy_variant: if (strscpy(ar->id.bdf_ext, variant, sizeof(ar->id.bdf_ext)) < 0) ath10k_dbg(ar, ATH10K_DBG_BOOT, "bdf variant string is longer than the buffer can accommodate (variant: %s)\n", diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c index 2810752260f2f7eee226f88d5aea7cdabe7e9ed4..2db067d6357c8848ede7384ec4a615ca22282650 100644 --- a/drivers/net/wireless/ath/ath11k/core.c +++ b/drivers/net/wireless/ath/ath11k/core.c @@ -20,6 +20,8 @@ #include "wow.h" #include "fw.h" +#include "../ath.h" + unsigned int ath11k_debug_mask; EXPORT_SYMBOL(ath11k_debug_mask); module_param_named(debug_mask, ath11k_debug_mask, uint, 0644); @@ -1362,6 +1364,10 @@ int ath11k_core_check_dt(struct ath11k_base *ab) const char *variant = NULL; struct device_node *node; + variant = ath_get_calib_variant(); + if (variant) + goto copy_variant; + node = ab->dev->of_node; if (!node) return -ENOENT; @@ -1374,6 +1380,7 @@ int ath11k_core_check_dt(struct ath11k_base *ab) if (!variant) return -ENODATA; +copy_variant: if (strscpy(ab->qmi.target.bdf_ext, variant, max_len) < 0) ath11k_dbg(ab, ATH11K_DBG_BOOT, "bdf variant string is longer than the buffer can accommodate (variant: %s)\n", -- 2.48.1 From manivannan.sadhasivam at oss.qualcomm.com Fri Nov 14 02:22:20 2025 From: manivannan.sadhasivam at oss.qualcomm.com (Manivannan Sadhasivam) Date: Fri, 14 Nov 2025 15:52:20 +0530 Subject: [PATCH 2/2] dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property In-Reply-To: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> Message-ID: <20251114-ath-variant-tbl-v1-2-a9adfc49e3f3@oss.qualcomm.com> On devicetree platforms, ath{10k/11k} drivers rely on the presence of the 'qcom,calibration-variant' property to select the correct calibration data for device variants with colliding IDs. But this property based selection has its own downside that it needs to be added to the devicetree node of the WLAN device, especially for PCI based devices. Currently, the users/vendors are forced to hardcode this property in the PCI device node. If a different device need to be attached to the slot, then the devicetree node also has to be changed. This approach is not scalable and creates a bad user experience. So deprecate this property from WLAN devicetree nodes and let the drivers do the devicetree model based calibration variant lookup using a static table. This also warrants removing the property from examples in the binding. Signed-off-by: Manivannan Sadhasivam --- Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml | 1 + Documentation/devicetree/bindings/net/wireless/qcom,ath11k-pci.yaml | 3 +-- Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml | 1 + Documentation/devicetree/bindings/net/wireless/qcom,ath12k-wsi.yaml | 6 +----- .../devicetree/bindings/net/wireless/qcom,ipq5332-wifi.yaml | 2 +- 5 files changed, 5 insertions(+), 8 deletions(-) diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml index f2440d39b7ebcda77db592de85573bec902fb334..efe11bdec30dcdb6d48185b68093ea8c247b8c3d 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml @@ -107,6 +107,7 @@ properties: qcom,calibration-variant: $ref: /schemas/types.yaml#/definitions/string + deprecated: true description: Unique variant identifier of the calibration data in board-2.bin for designs with colliding bus and device specific ids diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k-pci.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k-pci.yaml index e34d42a30192b80311a4c6bb41bd3c8ffc66375f..df7d7aae3343168ffa92bcce16a0b429a6d7bfef 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k-pci.yaml +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k-pci.yaml @@ -24,6 +24,7 @@ properties: qcom,calibration-variant: $ref: /schemas/types.yaml#/definitions/string + deprecated: true description: | string to uniquely identify variant of the calibration data for designs with colliding bus and device ids @@ -139,8 +140,6 @@ examples: vddrfa0p8-supply = <&vreg_pmu_rfa_0p8>; vddrfa1p2-supply = <&vreg_pmu_rfa_1p2>; vddrfa1p8-supply = <&vreg_pmu_rfa_1p7>; - - qcom,calibration-variant = "LE_X13S"; }; }; }; diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml index c089677702cf17f3016b054d21494d2a7706ce5d..45ae5d3ca73b75b0755466f4dd92df1625dcb4c1 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml @@ -43,6 +43,7 @@ properties: qcom,calibration-variant: $ref: /schemas/types.yaml#/definitions/string + deprecated: true description: string to uniquely identify variant of the calibration data in the board-2.bin for designs with colliding bus and device specific ids diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath12k-wsi.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ath12k-wsi.yaml index 589960144fe1d56eb6f15f63a2d594210e045d27..cd6604eab5f3608811805d204a4c59ce1dcc060a 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath12k-wsi.yaml +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath12k-wsi.yaml @@ -54,6 +54,7 @@ properties: qcom,calibration-variant: $ref: /schemas/types.yaml#/definitions/string + deprecated: true description: String to uniquely identify variant of the calibration data for designs with colliding bus and device ids @@ -110,8 +111,6 @@ examples: compatible = "pci17cb,1109"; reg = <0x0 0x0 0x0 0x0 0x0>; - qcom,calibration-variant = "RDP433_1"; - ports { #address-cells = <1>; #size-cells = <0>; @@ -146,7 +145,6 @@ examples: compatible = "pci17cb,1109"; reg = <0x0 0x0 0x0 0x0 0x0>; - qcom,calibration-variant = "RDP433_2"; qcom,wsi-controller; ports { @@ -183,8 +181,6 @@ examples: compatible = "pci17cb,1109"; reg = <0x0 0x0 0x0 0x0 0x0>; - qcom,calibration-variant = "RDP433_3"; - ports { #address-cells = <1>; #size-cells = <0>; diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ipq5332-wifi.yaml b/Documentation/devicetree/bindings/net/wireless/qcom,ipq5332-wifi.yaml index 363a0ecb6ad97c3dce72881ff552d238d08a2c12..1e6ff8e7a6c20cbe4abe31cacd8b25a78af05f4c 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ipq5332-wifi.yaml +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ipq5332-wifi.yaml @@ -151,6 +151,7 @@ properties: qcom,calibration-variant: $ref: /schemas/types.yaml#/definitions/string + deprecated: true description: String to uniquely identify variant of the calibration data for designs with colliding bus and device ids @@ -304,7 +305,6 @@ examples: memory-region = <&q6_region>, <&m3_dump>, <&q6_caldb>, <&mlo_mem>; memory-region-names = "q6-region", "m3-dump", "q6-caldb", "mlo-global-mem"; - qcom,calibration-variant = "RDP441_1"; qcom,rproc = <&q6v5_wcss>; qcom,smem-states = <&wcss_smp2p_out 8>, <&wcss_smp2p_out 9>, -- 2.48.1 From krzk at kernel.org Fri Nov 14 02:45:30 2025 From: krzk at kernel.org (Krzysztof Kozlowski) Date: Fri, 14 Nov 2025 11:45:30 +0100 Subject: [PATCH 1/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: <20251114-ath-variant-tbl-v1-1-a9adfc49e3f3@oss.qualcomm.com> References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <20251114-ath-variant-tbl-v1-1-a9adfc49e3f3@oss.qualcomm.com> Message-ID: <3a951821-14b1-464e-b1da-05a95f4164af@kernel.org> On 14/11/2025 11:22, Manivannan Sadhasivam wrote: > On devicetree platforms, ath{10k/11k} drivers rely on the presence of the > 'qcom,*calibration-variant' property to select the correct calibration data > for device variants with colliding IDs. > > But this property based selection has its own downside that it needs to be > added to the devicetree node of the WLAN device, especially for PCI based > devices. Currently, the users/vendors are forced to hardcode this property > in the PCI device node. If a different device need to be attached to the > slot, then the devicetree node also has to be changed. This approach is not > scalable and creates a bad user experience. > > To get rid of this requirement, this commit introduces a static calibration > variant table ath_calib_variant_table[], consisting of the platform model > and the calibration variant for all upstream supported devices. The entries > of this table are derived from the upstream DTS files. > > The newly introduced helper, ath_get_calib_variant() will parse the model > name from devicetree and use it to do the variant lookup during runtime. If > the platform model name doesn't match, it will fallback to the devicetree > property based lookup. > > Going forward, the devicetree based lookup will be deprecated and this > table will be used exclusively for devices connected to the devicetree > based host platforms. > > Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.2.0.c2-00204-QCAMSLSWPLZ-1 > > Signed-off-by: Manivannan Sadhasivam > --- > drivers/net/wireless/ath/ath.h | 98 ++++++++++++++++++++++++++++++++++ > drivers/net/wireless/ath/ath10k/core.c | 5 ++ > drivers/net/wireless/ath/ath11k/core.c | 7 +++ > 3 files changed, 110 insertions(+) > > diff --git a/drivers/net/wireless/ath/ath.h b/drivers/net/wireless/ath/ath.h > index 34654f710d8a1e63f65a47d4602e2035262a4d9e..d0a12151b7fc13355161c48ba1fb200e4617ed11 100644 > --- a/drivers/net/wireless/ath/ath.h > +++ b/drivers/net/wireless/ath/ath.h > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > #include > > /* > @@ -336,4 +337,101 @@ static inline const char *ath_bus_type_to_string(enum ath_bus_type bustype) > return ath_bus_type_strings[bustype]; > } > > +static const struct __ath_calib_variant_table { > + const char *machine; > + const char *variant; > +} ath_calib_variant_table[] = { > + { "ALFA Network AP120C-AC", "ALFA-Network-AP120C-AC" }, > + { "8devices Jalapeno", "8devices-Jalapeno" }, > + { "Google cozmo board", "GO_COZMO" }, > + { "Google damu board", "GO_DAMU" }, > + { "Google fennel sku1 board", "GO_FENNEL" }, > + { "Google fennel sku6 board", "GO_FENNEL" }, > + { "Google fennel sku7 board", "GO_FENNEL" }, Are these top-machine models? If so, you cannot use them. The value is user-informative, not ABI. If you wanted to use them, you would need to document the ABI. Just use compatible, that's the entire point of compatible. Best regards, Krzysztof From krzk at kernel.org Fri Nov 14 02:47:25 2025 From: krzk at kernel.org (Krzysztof Kozlowski) Date: Fri, 14 Nov 2025 11:47:25 +0100 Subject: [PATCH 2/2] dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property In-Reply-To: <20251114-ath-variant-tbl-v1-2-a9adfc49e3f3@oss.qualcomm.com> References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <20251114-ath-variant-tbl-v1-2-a9adfc49e3f3@oss.qualcomm.com> Message-ID: On 14/11/2025 11:22, Manivannan Sadhasivam wrote: > On devicetree platforms, ath{10k/11k} drivers rely on the presence of the > 'qcom,calibration-variant' property to select the correct calibration data > for device variants with colliding IDs. > > But this property based selection has its own downside that it needs to be > added to the devicetree node of the WLAN device, especially for PCI based > devices. Currently, the users/vendors are forced to hardcode this property > in the PCI device node. If a different device need to be attached to the > slot, then the devicetree node also has to be changed. This approach is not > scalable and creates a bad user experience. > > So deprecate this property from WLAN devicetree nodes and let the drivers > do the devicetree model based calibration variant lookup using a static > table. > > This also warrants removing the property from examples in the binding. > > Signed-off-by: Manivannan Sadhasivam > --- The problem - visible in one of the examples here - is that one board has multiple WiFi chips and they use different calibration-variant properties. How do you find the right calibration variant for such case based on board machine match? Best regards, Krzysztof From manivannan.sadhasivam at oss.qualcomm.com Fri Nov 14 03:02:15 2025 From: manivannan.sadhasivam at oss.qualcomm.com (Manivannan Sadhasivam) Date: Fri, 14 Nov 2025 16:32:15 +0530 Subject: [PATCH 2/2] dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property In-Reply-To: References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <20251114-ath-variant-tbl-v1-2-a9adfc49e3f3@oss.qualcomm.com> Message-ID: On Fri, Nov 14, 2025 at 11:47:25AM +0100, Krzysztof Kozlowski wrote: > On 14/11/2025 11:22, Manivannan Sadhasivam wrote: > > On devicetree platforms, ath{10k/11k} drivers rely on the presence of the > > 'qcom,calibration-variant' property to select the correct calibration data > > for device variants with colliding IDs. > > > > But this property based selection has its own downside that it needs to be > > added to the devicetree node of the WLAN device, especially for PCI based > > devices. Currently, the users/vendors are forced to hardcode this property > > in the PCI device node. If a different device need to be attached to the > > slot, then the devicetree node also has to be changed. This approach is not > > scalable and creates a bad user experience. > > > > So deprecate this property from WLAN devicetree nodes and let the drivers > > do the devicetree model based calibration variant lookup using a static > > table. > > > > This also warrants removing the property from examples in the binding. > > > > Signed-off-by: Manivannan Sadhasivam > > --- > > The problem - visible in one of the examples here - is that one board > has multiple WiFi chips and they use different calibration-variant > properties. How do you find the right calibration variant for such case > based on board machine match? > I suspect the legitimacy of the example here. I don't understand how a single machine can have same devices with 3 different calibration data. AFAIU, calibration data is specific to the platform design. And I don't see any upstream supported devicetree having similar properties. - Mani -- ????????? ???????? From krzk at kernel.org Fri Nov 14 03:04:55 2025 From: krzk at kernel.org (Krzysztof Kozlowski) Date: Fri, 14 Nov 2025 12:04:55 +0100 Subject: [PATCH 2/2] dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property In-Reply-To: References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <20251114-ath-variant-tbl-v1-2-a9adfc49e3f3@oss.qualcomm.com> Message-ID: <1703d8d7-5105-4585-b8f0-82bb54809718@kernel.org> On 14/11/2025 12:02, Manivannan Sadhasivam wrote: > On Fri, Nov 14, 2025 at 11:47:25AM +0100, Krzysztof Kozlowski wrote: >> On 14/11/2025 11:22, Manivannan Sadhasivam wrote: >>> On devicetree platforms, ath{10k/11k} drivers rely on the presence of the >>> 'qcom,calibration-variant' property to select the correct calibration data >>> for device variants with colliding IDs. >>> >>> But this property based selection has its own downside that it needs to be >>> added to the devicetree node of the WLAN device, especially for PCI based >>> devices. Currently, the users/vendors are forced to hardcode this property >>> in the PCI device node. If a different device need to be attached to the >>> slot, then the devicetree node also has to be changed. This approach is not >>> scalable and creates a bad user experience. >>> >>> So deprecate this property from WLAN devicetree nodes and let the drivers >>> do the devicetree model based calibration variant lookup using a static >>> table. >>> >>> This also warrants removing the property from examples in the binding. >>> >>> Signed-off-by: Manivannan Sadhasivam >>> --- >> >> The problem - visible in one of the examples here - is that one board >> has multiple WiFi chips and they use different calibration-variant >> properties. How do you find the right calibration variant for such case >> based on board machine match? >> > > I suspect the legitimacy of the example here. I don't understand how a single > machine can have same devices with 3 different calibration data. Me neither but I am not the domain expert here. > > AFAIU, calibration data is specific to the platform design. And I don't see any > upstream supported devicetree having similar properties. Deprecating these is fine with me, but I would prefer if we get here some clear answers that mentioned case cannot happen. If you are sure of that, please mention it in commit msg. Best regards, Krzysztof From manivannan.sadhasivam at oss.qualcomm.com Fri Nov 14 03:16:00 2025 From: manivannan.sadhasivam at oss.qualcomm.com (Manivannan Sadhasivam) Date: Fri, 14 Nov 2025 16:46:00 +0530 Subject: [PATCH 1/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: <3a951821-14b1-464e-b1da-05a95f4164af@kernel.org> References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <20251114-ath-variant-tbl-v1-1-a9adfc49e3f3@oss.qualcomm.com> <3a951821-14b1-464e-b1da-05a95f4164af@kernel.org> Message-ID: + Srini On Fri, Nov 14, 2025 at 11:45:30AM +0100, Krzysztof Kozlowski wrote: > On 14/11/2025 11:22, Manivannan Sadhasivam wrote: > > On devicetree platforms, ath{10k/11k} drivers rely on the presence of the > > 'qcom,*calibration-variant' property to select the correct calibration data > > for device variants with colliding IDs. > > > > But this property based selection has its own downside that it needs to be > > added to the devicetree node of the WLAN device, especially for PCI based > > devices. Currently, the users/vendors are forced to hardcode this property > > in the PCI device node. If a different device need to be attached to the > > slot, then the devicetree node also has to be changed. This approach is not > > scalable and creates a bad user experience. > > > > To get rid of this requirement, this commit introduces a static calibration > > variant table ath_calib_variant_table[], consisting of the platform model > > and the calibration variant for all upstream supported devices. The entries > > of this table are derived from the upstream DTS files. > > > > The newly introduced helper, ath_get_calib_variant() will parse the model > > name from devicetree and use it to do the variant lookup during runtime. If > > the platform model name doesn't match, it will fallback to the devicetree > > property based lookup. > > > > Going forward, the devicetree based lookup will be deprecated and this > > table will be used exclusively for devices connected to the devicetree > > based host platforms. > > > > Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.2.0.c2-00204-QCAMSLSWPLZ-1 > > > > Signed-off-by: Manivannan Sadhasivam > > --- > > drivers/net/wireless/ath/ath.h | 98 ++++++++++++++++++++++++++++++++++ > > drivers/net/wireless/ath/ath10k/core.c | 5 ++ > > drivers/net/wireless/ath/ath11k/core.c | 7 +++ > > 3 files changed, 110 insertions(+) > > > > diff --git a/drivers/net/wireless/ath/ath.h b/drivers/net/wireless/ath/ath.h > > index 34654f710d8a1e63f65a47d4602e2035262a4d9e..d0a12151b7fc13355161c48ba1fb200e4617ed11 100644 > > --- a/drivers/net/wireless/ath/ath.h > > +++ b/drivers/net/wireless/ath/ath.h > > @@ -21,6 +21,7 @@ > > #include > > #include > > #include > > +#include > > #include > > > > /* > > @@ -336,4 +337,101 @@ static inline const char *ath_bus_type_to_string(enum ath_bus_type bustype) > > return ath_bus_type_strings[bustype]; > > } > > > > +static const struct __ath_calib_variant_table { > > + const char *machine; > > + const char *variant; > > +} ath_calib_variant_table[] = { > > + { "ALFA Network AP120C-AC", "ALFA-Network-AP120C-AC" }, > > + { "8devices Jalapeno", "8devices-Jalapeno" }, > > + { "Google cozmo board", "GO_COZMO" }, > > + { "Google damu board", "GO_DAMU" }, > > + { "Google fennel sku1 board", "GO_FENNEL" }, > > + { "Google fennel sku6 board", "GO_FENNEL" }, > > + { "Google fennel sku7 board", "GO_FENNEL" }, > > Are these top-machine models? If so, you cannot use them. The value is > user-informative, not ABI. If you wanted to use them, you would need to > document the ABI. > I had this question initially, but Srini convinced me it is OK to use it in the driver as they do it in audio :) > Just use compatible, that's the entire point of compatible. > Ok! - Mani -- ????????? ???????? From manivannan.sadhasivam at oss.qualcomm.com Fri Nov 14 03:18:32 2025 From: manivannan.sadhasivam at oss.qualcomm.com (Manivannan Sadhasivam) Date: Fri, 14 Nov 2025 16:48:32 +0530 Subject: [PATCH 2/2] dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property In-Reply-To: <1703d8d7-5105-4585-b8f0-82bb54809718@kernel.org> References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <20251114-ath-variant-tbl-v1-2-a9adfc49e3f3@oss.qualcomm.com> <1703d8d7-5105-4585-b8f0-82bb54809718@kernel.org> Message-ID: On Fri, Nov 14, 2025 at 12:04:55PM +0100, Krzysztof Kozlowski wrote: > On 14/11/2025 12:02, Manivannan Sadhasivam wrote: > > On Fri, Nov 14, 2025 at 11:47:25AM +0100, Krzysztof Kozlowski wrote: > >> On 14/11/2025 11:22, Manivannan Sadhasivam wrote: > >>> On devicetree platforms, ath{10k/11k} drivers rely on the presence of the > >>> 'qcom,calibration-variant' property to select the correct calibration data > >>> for device variants with colliding IDs. > >>> > >>> But this property based selection has its own downside that it needs to be > >>> added to the devicetree node of the WLAN device, especially for PCI based > >>> devices. Currently, the users/vendors are forced to hardcode this property > >>> in the PCI device node. If a different device need to be attached to the > >>> slot, then the devicetree node also has to be changed. This approach is not > >>> scalable and creates a bad user experience. > >>> > >>> So deprecate this property from WLAN devicetree nodes and let the drivers > >>> do the devicetree model based calibration variant lookup using a static > >>> table. > >>> > >>> This also warrants removing the property from examples in the binding. > >>> > >>> Signed-off-by: Manivannan Sadhasivam > >>> --- > >> > >> The problem - visible in one of the examples here - is that one board > >> has multiple WiFi chips and they use different calibration-variant > >> properties. How do you find the right calibration variant for such case > >> based on board machine match? > >> > > > > I suspect the legitimacy of the example here. I don't understand how a single > > machine can have same devices with 3 different calibration data. > > Me neither but I am not the domain expert here. > > > > > AFAIU, calibration data is specific to the platform design. And I don't see any > > upstream supported devicetree having similar properties. > Deprecating these is fine with me, but I would prefer if we get here > some clear answers that mentioned case cannot happen. If you are sure of > that, please mention it in commit msg. > I'm pretty sure that this example is wrong. But I will wait for Jeff or other ath developers to confirm. - Mani -- ????????? ???????? From krzk at kernel.org Fri Nov 14 03:24:38 2025 From: krzk at kernel.org (Krzysztof Kozlowski) Date: Fri, 14 Nov 2025 12:24:38 +0100 Subject: [PATCH 1/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <20251114-ath-variant-tbl-v1-1-a9adfc49e3f3@oss.qualcomm.com> <3a951821-14b1-464e-b1da-05a95f4164af@kernel.org> Message-ID: On 14/11/2025 12:16, Manivannan Sadhasivam wrote: >>> >>> +static const struct __ath_calib_variant_table { >>> + const char *machine; >>> + const char *variant; >>> +} ath_calib_variant_table[] = { >>> + { "ALFA Network AP120C-AC", "ALFA-Network-AP120C-AC" }, >>> + { "8devices Jalapeno", "8devices-Jalapeno" }, >>> + { "Google cozmo board", "GO_COZMO" }, >>> + { "Google damu board", "GO_DAMU" }, >>> + { "Google fennel sku1 board", "GO_FENNEL" }, >>> + { "Google fennel sku6 board", "GO_FENNEL" }, >>> + { "Google fennel sku7 board", "GO_FENNEL" }, >> >> Are these top-machine models? If so, you cannot use them. The value is >> user-informative, not ABI. If you wanted to use them, you would need to >> document the ABI. >> > > I had this question initially, but Srini convinced me it is OK to use it in the > driver as they do it in audio :) That's sounds like an issue which could be fixed or at least discussed. There is no in-kernel usage of ASoC's 'model' property, thus we probably never noticed that it is an ABI. OTOH, everyone apparently knows that audio's 'model' is an ABI because no one changes it, unlike top-level machine 'model' which is being changed from time to time. Best regards, Krzysztof From srinivas.kandagatla at oss.qualcomm.com Fri Nov 14 03:44:18 2025 From: srinivas.kandagatla at oss.qualcomm.com (Srinivas Kandagatla) Date: Fri, 14 Nov 2025 11:44:18 +0000 Subject: [PATCH 1/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <20251114-ath-variant-tbl-v1-1-a9adfc49e3f3@oss.qualcomm.com> <3a951821-14b1-464e-b1da-05a95f4164af@kernel.org> Message-ID: <70322e09-694a-471d-b4fc-f5a8a1c01450@oss.qualcomm.com> On 11/14/25 11:24 AM, Krzysztof Kozlowski wrote: > On 14/11/2025 12:16, Manivannan Sadhasivam wrote: >>>> >>>> +static const struct __ath_calib_variant_table { >>>> + const char *machine; >>>> + const char *variant; >>>> +} ath_calib_variant_table[] = { >>>> + { "ALFA Network AP120C-AC", "ALFA-Network-AP120C-AC" }, >>>> + { "8devices Jalapeno", "8devices-Jalapeno" }, >>>> + { "Google cozmo board", "GO_COZMO" }, >>>> + { "Google damu board", "GO_DAMU" }, >>>> + { "Google fennel sku1 board", "GO_FENNEL" }, >>>> + { "Google fennel sku6 board", "GO_FENNEL" }, >>>> + { "Google fennel sku7 board", "GO_FENNEL" }, >>> >>> Are these top-machine models? If so, you cannot use them. The value is >>> user-informative, not ABI. If you wanted to use them, you would need to >>> document the ABI. the value has expected format, can it not be an ABI?, from DT Specs: "Specifies a string that uniquely identifies the model of the system board" We can argue that its not part of Documentation/devicetree/bindings/arm/qcom.yaml @Mani, can you not use the top level machine compatibles instead, something like: "google,fennel-sku7" instead of "Google fennel sku7 board" which is an ABI. >>> >> >> I had this question initially, but Srini convinced me it is OK to use it in the >> driver as they do it in audio :) > > That's sounds like an issue which could be fixed or at least discussed. > There is no in-kernel usage of ASoC's 'model' property, thus we probably > never noticed that it is an ABI. > model is actually used as soundcard name and long name if there is no DMI info for the platform, This string is also used at the UCM level to identify the correct UCM configuration. However the model that we are referring for sound is part of the dt-bindings for the sound card, not the top-level model, so this is an ABI for soundcard itself. --srini > OTOH, everyone apparently knows that audio's 'model' is an ABI because > no one changes it, unlike top-level machine 'model' which is being > changed from time to time. > > Best regards, > Krzysztof From krzk at kernel.org Fri Nov 14 03:48:59 2025 From: krzk at kernel.org (Krzysztof Kozlowski) Date: Fri, 14 Nov 2025 12:48:59 +0100 Subject: [PATCH 1/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: <70322e09-694a-471d-b4fc-f5a8a1c01450@oss.qualcomm.com> References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <20251114-ath-variant-tbl-v1-1-a9adfc49e3f3@oss.qualcomm.com> <3a951821-14b1-464e-b1da-05a95f4164af@kernel.org> <70322e09-694a-471d-b4fc-f5a8a1c01450@oss.qualcomm.com> Message-ID: On 14/11/2025 12:44, Srinivas Kandagatla wrote: > On 11/14/25 11:24 AM, Krzysztof Kozlowski wrote: >> On 14/11/2025 12:16, Manivannan Sadhasivam wrote: >>>>> >>>>> +static const struct __ath_calib_variant_table { >>>>> + const char *machine; >>>>> + const char *variant; >>>>> +} ath_calib_variant_table[] = { >>>>> + { "ALFA Network AP120C-AC", "ALFA-Network-AP120C-AC" }, >>>>> + { "8devices Jalapeno", "8devices-Jalapeno" }, >>>>> + { "Google cozmo board", "GO_COZMO" }, >>>>> + { "Google damu board", "GO_DAMU" }, >>>>> + { "Google fennel sku1 board", "GO_FENNEL" }, >>>>> + { "Google fennel sku6 board", "GO_FENNEL" }, >>>>> + { "Google fennel sku7 board", "GO_FENNEL" }, >>>> >>>> Are these top-machine models? If so, you cannot use them. The value is >>>> user-informative, not ABI. If you wanted to use them, you would need to >>>> document the ABI. > > the value has expected format, can it not be an ABI?, from DT Specs: Where is the ABI documented? You should not have ABI which is completely undocumented. > "Specifies a string that uniquely identifies the model of the system > board" We can argue that its not part of > Documentation/devicetree/bindings/arm/qcom.yaml > > @Mani, can you not use the top level machine compatibles instead, > something like: "google,fennel-sku7" instead of "Google fennel sku7 > board" which is an ABI. > >>>> >>> >>> I had this question initially, but Srini convinced me it is OK to use it in the >>> driver as they do it in audio :) >> >> That's sounds like an issue which could be fixed or at least discussed. >> There is no in-kernel usage of ASoC's 'model' property, thus we probably >> never noticed that it is an ABI. >> > model is actually used as soundcard name and long name if there is no > DMI info for the platform, This string is also used at the UCM level to > identify the correct UCM configuration. You speak about user-space... I did not dispute that. I said - it is not used in the kernel. > > > However the model that we are referring for sound is part of the > dt-bindings for the sound card, not the top-level model, so this is an > ABI for soundcard itself. We speak about the values. They are not defined as ABI and not used in the kernel. Best regards, Krzysztof From rafael at kernel.org Fri Nov 14 04:13:12 2025 From: rafael at kernel.org (Rafael J. Wysocki) Date: Fri, 14 Nov 2025 13:13:12 +0100 Subject: [PATCH RFC 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: <20251114-thermal-device-v1-0-d8b442aae38b@gmx.de> References: <20251114-thermal-device-v1-0-d8b442aae38b@gmx.de> Message-ID: On Fri, Nov 14, 2025 at 4:24?AM Armin Wolf wrote: > > Drivers registering thermal zone/cooling devices are currently unable > to tell the thermal core what parent device the new thermal zone/ > cooling device should have, potentially causing issues with suspend > ordering Do you have any examples of this? > and making it impossible for user space appications to > associate a given thermal zone device with its parent device. > > This patch series aims to fix this issue by extending the functions > used to register thermal zone/cooling devices to also accept a parent > device pointer. The first six patches convert all functions used for > registering cooling devices, while the functions used for registering > thermal zone devices are converted by the remaining two patches. > > I tested this series on various devices containing (among others): > - ACPI thermal zones > - ACPI processor devices > - PCIe cooling devices > - Intel Wifi card > - Intel powerclamp > - Intel TCC cooling > > I also compile-tested the remaining affected drivers, however i would > still be happy if the relevant maintainers (especially those of the > mellanox ethernet switch driver) could take a quick glance at the > code and verify that i am using the correct device as the parent > device. > > This work is also necessary for extending the ACPI thermal zone driver > to support the _TZD ACPI object in the future. Can you please elaborate a bit here? _TZD is a list of devices that belong to the given thermal zone, so how is it connected to the thermal zone parent? > Signed-off-by: Armin Wolf > --- > Armin Wolf (8): > thermal: core: Allow setting the parent device of cooling devices > thermal: core: Set parent device in thermal_of_cooling_device_register() > ACPI: processor: Stop creating "device" sysfs link > ACPI: fan: Stop creating "device" sysfs link > ACPI: video: Stop creating "device" sysfs link > thermal: core: Set parent device in thermal_cooling_device_register() > ACPI: thermal: Stop creating "device" sysfs link > thermal: core: Allow setting the parent device of thermal zone devices I can only see the first three patches in the series ATM as per https://lore.kernel.org/linux-pm/20251114-thermal-device-v1-0-d8b442aae38b at gmx.de/T/#r605b23f2e27e751d8406e7949dad6f5b5b112067 From jeff.johnson at oss.qualcomm.com Fri Nov 14 09:29:46 2025 From: jeff.johnson at oss.qualcomm.com (Jeff Johnson) Date: Fri, 14 Nov 2025 09:29:46 -0800 Subject: [PATCH 2/2] dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property In-Reply-To: References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <20251114-ath-variant-tbl-v1-2-a9adfc49e3f3@oss.qualcomm.com> <1703d8d7-5105-4585-b8f0-82bb54809718@kernel.org> Message-ID: <7757501b-2576-4f5d-a16a-40e06f12cb5f@oss.qualcomm.com> On 11/14/2025 3:18 AM, Manivannan Sadhasivam wrote: > On Fri, Nov 14, 2025 at 12:04:55PM +0100, Krzysztof Kozlowski wrote: >> On 14/11/2025 12:02, Manivannan Sadhasivam wrote: >>> On Fri, Nov 14, 2025 at 11:47:25AM +0100, Krzysztof Kozlowski wrote: >>>> On 14/11/2025 11:22, Manivannan Sadhasivam wrote: >>>>> On devicetree platforms, ath{10k/11k} drivers rely on the presence of the >>>>> 'qcom,calibration-variant' property to select the correct calibration data >>>>> for device variants with colliding IDs. >>>>> >>>>> But this property based selection has its own downside that it needs to be >>>>> added to the devicetree node of the WLAN device, especially for PCI based >>>>> devices. Currently, the users/vendors are forced to hardcode this property >>>>> in the PCI device node. If a different device need to be attached to the >>>>> slot, then the devicetree node also has to be changed. This approach is not >>>>> scalable and creates a bad user experience. >>>>> >>>>> So deprecate this property from WLAN devicetree nodes and let the drivers >>>>> do the devicetree model based calibration variant lookup using a static >>>>> table. >>>>> >>>>> This also warrants removing the property from examples in the binding. >>>>> >>>>> Signed-off-by: Manivannan Sadhasivam >>>>> --- >>>> >>>> The problem - visible in one of the examples here - is that one board >>>> has multiple WiFi chips and they use different calibration-variant >>>> properties. How do you find the right calibration variant for such case >>>> based on board machine match? >>>> >>> >>> I suspect the legitimacy of the example here. I don't understand how a single >>> machine can have same devices with 3 different calibration data. >> >> Me neither but I am not the domain expert here. >> >>> >>> AFAIU, calibration data is specific to the platform design. And I don't see any >>> upstream supported devicetree having similar properties. >> Deprecating these is fine with me, but I would prefer if we get here >> some clear answers that mentioned case cannot happen. If you are sure of >> that, please mention it in commit msg. >> > > I'm pretty sure that this example is wrong. But I will wait for Jeff or other > ath developers to confirm. As discussed privately this is a valid example. This is a single-band chip. So a tri-band router platform will have 3 boards, one that is supporting 2 GHz, one supporting 5 GHz, and one supporting 6 GHz, and each frequency range will have different calibration data. So we still need to support slot-specific configuration in cases where the slot to board mapping really is fixed in the platform. /jeff From prestwoj at gmail.com Fri Nov 14 13:52:22 2025 From: prestwoj at gmail.com (James Prestwood) Date: Fri, 14 Nov 2025 13:52:22 -0800 Subject: ath10k "failed to install key for vdev 0 peer : -110" In-Reply-To: <69232460-cd7b-4723-9ed4-b4473a7c5d90@gmail.com> References: <54fac081-7d70-4d31-9f2a-07f5d75d675d@quicinc.com> <22978701-ca79-4e90-8ceb-16bdaf230e8f@quicinc.com> <54f29515-047d-483d-8d9f-a0315a71ad7a@quicinc.com> <0e474fe5-cebc-487e-8884-ba505d83711a@quicinc.com> <69232460-cd7b-4723-9ed4-b4473a7c5d90@gmail.com> Message-ID: On 12/9/24 4:37 AM, James Prestwood wrote: > > On 12/8/24 10:48 PM, Baochen Qiang wrote: >> >> On 12/6/2024 8:27 PM, James Prestwood wrote: >>> Hi Baochen, >>> >>> On 12/5/24 6:47 PM, Baochen Qiang wrote: >>>> On 9/5/2024 9:46 AM, Baochen Qiang wrote: >>>>> On 9/5/2024 2:03 AM, Jeff Johnson wrote: >>>>>> On 8/16/2024 5:04 AM, James Prestwood wrote: >>>>>>> Hi Baochen, >>>>>>> >>>>>>> On 8/16/24 3:19 AM, Baochen Qiang wrote: >>>>>>>> On 7/12/2024 9:11 PM, James Prestwood wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I've seen this error mentioned on random forum posts, but its >>>>>>>>> always associated >>>>>>>>> with a kernel crash/warning or some very obvious negative >>>>>>>>> behavior. I've noticed >>>>>>>>> this occasionally and at one location very frequently during >>>>>>>>> FT roaming, >>>>>>>>> specifically just after CMD_ASSOCIATE is issued. For our >>>>>>>>> company run networks I'm >>>>>>>>> not seeing any negative behavior apart from a 3 second delay >>>>>>>>> in sending the re- >>>>>>>>> association frame since the kernel waits for this timeout. But >>>>>>>>> we have some >>>>>>>>> networks our clients run on that we do not own (different >>>>>>>>> vendor), and we are >>>>>>>>> seeing association timeouts after this error occurs and in >>>>>>>>> some cases the AP is >>>>>>>>> sending a deauthentication with reason code 8 instead of >>>>>>>>> replying with a >>>>>>>>> reassociation reply and an error status, which is quite odd. >>>>>>>>> >>>>>>>>> We are chasing down this with the vendor of these APs as well, >>>>>>>>> but the behavior >>>>>>>>> always happens after we see this key removal failure/timeout >>>>>>>>> on the client side. So >>>>>>>>> it would appear there is potentially a problem on both the >>>>>>>>> client and AP. My guess >>>>>>>>> is _something_ about the re-association frame changes when >>>>>>>>> this error is >>>>>>>>> encountered, but I cannot see how that would be the case. We >>>>>>>>> are working to get >>>>>>>>> PCAPs now, but its through a 3rd party, so that timing is out >>>>>>>>> of my control. >>>>>>>>> >>>>>>>>> ?? From the kernel code this error would appear innocuous, the >>>>>>>>> old key is failing to >>>>>>>>> be removed but it gets immediately replaced by the new key. >>>>>>>>> And we don't see that >>>>>>>>> addition failing. Am I understanding that logic correctly? >>>>>>>>> I.e. this logic: >>>>>>>>> >>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ >>>>>>>>> >>>>>>>>> mac80211/key.c#n503 >>>>>>>>> >>>>>>>>> Below are a few kernel logs of the issue happening, some with >>>>>>>>> the deauth being sent >>>>>>>>> by the AP, some with just timeouts: >>>>>>>>> >>>>>>>>> --- No deauth frame sent, just association timeouts after the >>>>>>>>> error --- >>>>>>>>> >>>>>>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP >>>>>>>> BSS> for new assoc to >>>>>>>>> >>>>>>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to >>>>>>>>> install key for vdev 0 >>>>>>>>> peer?: -110 >>>>>>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key >>>>>>>>> (0,?) from >>>>>>>>> hardware (-110) >>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with? (try 1/3) >>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with? (try 2/3) >>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with? (try 3/3) >>>>>>>>> Jul 11 00:05:33 kernel: wlan0: association with? >>>>>>>>> timed out >>>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with >>>>>>>>> Jul 11 00:05:36 kernel: wlan0: send auth to?a (try 1/3) >>>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticated >>>>>>>>> Jul 11 00:05:36 kernel: wlan0: associate with (try 1/3) >>>>>>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from? >>>>>>>>> (capab=0x1111 status=0 >>>>>>>>> aid=16) >>>>>>>>> Jul 11 00:05:36 kernel: wlan0: associated >>>>>>>>> >>>>>>>>> --- Deauth frame sent amidst the association timeouts --- >>>>>>>>> >>>>>>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP >>>>>>>> BSS> for new assoc to >>>>>>>>> >>>>>>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to >>>>>>>>> install key for vdev 0 >>>>>>>>> peer : -110 >>>>>>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, >>>>>>>>> ) from >>>>>>>>> hardware (-110) >>>>>>>>> Jul 11 00:43:21 kernel: wlan0: associate with (try 1/3) >>>>>>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from >>>>>>>>> while associating >>>>>>>>> (Reason: 8=DISASSOC_STA_HAS_LEFT) >>>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with >>>>>>>>> Jul 11 00:43:24 kernel: wlan0: send auth to (try 1/3) >>>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticated >>>>>>>>> Jul 11 00:43:24 kernel: wlan0: associate with (try 1/3) >>>>>>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from >>>>>>>>> (capab=0x1111 status=0 >>>>>>>>> aid=101) >>>>>>>>> Jul 11 00:43:24 kernel: wlan0: associated >>>>>>>>> >>>>>>>> Hi James, this is QCA6174, right? could you also share firmware >>>>>>>> version? >>>>>>> Yep, using: >>>>>>> >>>>>>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261 >>>>>>> firmware ver WLAN.RM.4.4.1-00288- api 6 features >>>>>>> wowlan,ignore-otp,mfp >>>>>>> crc32 bf907c7c >>>>>>> >>>>>>> I did try in one instance the latest firmware, 309, and still >>>>>>> saw the >>>>>>> same behavior but 288 is what all our devices are running. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> James >>>>>> Baochen, are you looking more into this? Would prefer to fix the >>>>>> root cause >>>>>> rather than take "[RFC 0/1] wifi: ath10k: improvement on key >>>>>> removal failure" >>>>> I asked CST team to try to reproduce this issue such that we can >>>>> get firmware dump for >>>>> debug further. What I got is that CST team is currently busy at >>>>> other critical >>>>> schedules and they are planning to debug this ath10k issue after >>>>> those schedules get >>>>> finished. >>>>> >>>> Jeff, I am notified that CST team can not reproduce this issue. >>> Thanks for reaching out to them at least. Maybe the firmware team >>> can provide some info >>> about how long it _should_ take to remove a key and we can make the >>> timeout reflect that? >> are you implying that the failure is due to a not-long-enough wait in >> host driver? or you >> want to know the maximum time firmware needs in removing key, and if >> it is less than 3s we >> can reduce current timeout to WAR the issue you hit? > No I'm not implying the wait isn't long enough. I would like to know > the maximum time the firmware should take normally and only wait that > amount of time, which would fix the issues we see with Cisco APs. >> >>> Thanks, >>> >>> James >>> >>> Attempting to revive this thread again with additional information. After initially discovering this I have been carrying a patch which lowers the timeout to 1 second instead of 3. Though undesirable (since it delays roams by 1 second) it did work around the issue with Cisco APs. Unfortunately we now see the same issue with another vendor, "Extreme Networks", despite the delay being only 1 second. I can't remember if it was mentioned but we do not see this failure with other AP vendors like Meraki or Aruba, and even some clients that use Cisco don't experience it. But it appears to happen more (sometimes 90%+ of the time) with certain AP vendors. I cannot begin to imagine how the AP would have any effect on the driver/firmware's ability to remove a key locally, but here we are. Currently I'm thinking I have 2 options: ? - Further reduce the wait, but given the failure happens so consistently the roaming time will be at minimum whatever I set the timeout to. ? - Remove the wait entirely for DISABLE_KEY. I have no idea if this is safe/recommenced but given the failure isn't handled (only an error log) it feels like I could remove it. Thanks, James From lkp at intel.com Sat Nov 15 01:51:28 2025 From: lkp at intel.com (kernel test robot) Date: Sat, 15 Nov 2025 17:51:28 +0800 Subject: [PATCH 1/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: <20251114-ath-variant-tbl-v1-1-a9adfc49e3f3@oss.qualcomm.com> References: <20251114-ath-variant-tbl-v1-1-a9adfc49e3f3@oss.qualcomm.com> Message-ID: <202511151754.ggEBSIV4-lkp@intel.com> Hi Manivannan, kernel test robot noticed the following build warnings: [auto build test WARNING on 3a8660878839faadb4f1a6dd72c3179c1df56787] url: https://github.com/intel-lab-lkp/linux/commits/Manivannan-Sadhasivam/wifi-ath-Use-static-calibration-variant-table-for-devicetree-platforms/20251114-183506 base: 3a8660878839faadb4f1a6dd72c3179c1df56787 patch link: https://lore.kernel.org/r/20251114-ath-variant-tbl-v1-1-a9adfc49e3f3%40oss.qualcomm.com patch subject: [PATCH 1/2] wifi: ath: Use static calibration variant table for devicetree platforms config: x86_64-rhel-9.4-ltp (https://download.01.org/0day-ci/archive/20251115/202511151754.ggEBSIV4-lkp at intel.com/config) compiler: gcc-14 (Debian 14.2.0-19) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251115/202511151754.ggEBSIV4-lkp at intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202511151754.ggEBSIV4-lkp at intel.com/ All warnings (new ones prefixed by >>): In file included from drivers/net/wireless/ath/ath10k/core.h:26, from drivers/net/wireless/ath/ath10k/core.c:21: In function 'ath_get_calib_variant', inlined from 'ath10k_core_check_dt' at drivers/net/wireless/ath/ath10k/core.c:1164:12: >> drivers/net/wireless/ath/ath10k/../ath.h:431:36: warning: argument 2 null where non-null expected [-Wnonnull] 431 | while ((entry->machine) && strcmp(entry->machine, model)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from include/linux/bitmap.h:13, from include/linux/cpumask.h:12, from arch/x86/include/asm/cpumask.h:5, from arch/x86/include/asm/msr.h:11, from arch/x86/include/asm/tsc.h:11, from arch/x86/include/asm/timex.h:6, from include/linux/timex.h:67, from include/linux/time32.h:13, from include/linux/time.h:60, from include/linux/stat.h:19, from include/linux/module.h:13, from drivers/net/wireless/ath/ath10k/core.c:11: include/linux/string.h: In function 'ath10k_core_check_dt': include/linux/string.h:161:12: note: in a call to function 'strcmp' declared 'nonnull' 161 | extern int strcmp(const char *,const char *); | ^~~~~~ -- In file included from drivers/net/wireless/ath/ath11k/core.c:23: In function 'ath_get_calib_variant', inlined from 'ath11k_core_check_dt' at drivers/net/wireless/ath/ath11k/core.c:1367:12: >> drivers/net/wireless/ath/ath11k/../ath.h:431:36: warning: argument 2 null where non-null expected [-Wnonnull] 431 | while ((entry->machine) && strcmp(entry->machine, model)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from include/linux/bitmap.h:13, from include/linux/cpumask.h:12, from arch/x86/include/asm/cpumask.h:5, from arch/x86/include/asm/msr.h:11, from arch/x86/include/asm/tsc.h:11, from arch/x86/include/asm/timex.h:6, from include/linux/timex.h:67, from include/linux/time32.h:13, from include/linux/time.h:60, from include/linux/stat.h:19, from include/linux/module.h:13, from drivers/net/wireless/ath/ath11k/core.c:9: include/linux/string.h: In function 'ath11k_core_check_dt': include/linux/string.h:161:12: note: in a call to function 'strcmp' declared 'nonnull' 161 | extern int strcmp(const char *,const char *); | ^~~~~~ vim +431 drivers/net/wireless/ath/ath10k/../ath.h 424 425 static inline const char *ath_get_calib_variant(void) 426 { 427 const struct __ath_calib_variant_table *entry = ath_calib_variant_table; 428 struct device_node *root __free(device_node) = of_find_node_by_path("/"); 429 const char *model = of_get_property(root, "model", NULL); 430 > 431 while ((entry->machine) && strcmp(entry->machine, model)) 432 entry++; 433 434 return entry->machine ? entry->variant : NULL; 435 } 436 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki From baochen.qiang at oss.qualcomm.com Sun Nov 16 18:36:39 2025 From: baochen.qiang at oss.qualcomm.com (Baochen Qiang) Date: Mon, 17 Nov 2025 10:36:39 +0800 Subject: [PATCH 0/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> Message-ID: <2fd84ab2-2e3e-4d05-add5-17930a35fedf@oss.qualcomm.com> On 11/14/2025 6:22 PM, Manivannan Sadhasivam wrote: > Hi, > > This series aims to deprecate the usage of "qcom,*calibration-variant" > devicetree property to select the calibration variant for the WLAN devices. This > is necessary for WLAN devices connected using PCI bus, as hardcoding the device > specific information in PCI devicetree node causes the node to be updated every > time when a new device variant is attached to the PCI slot. This approach is not > scalable and causes bad user experience. I am not very clear about the problem here: is calibration variant device/module specific, or platform specific? If it is module specific, why the lookup is based on the machine 'model' property? While if it is platform specific, why do we need to update devicetree node whenever a new device is attached? > > So to avoid relying on the "qcom,*calibration-variant" property, this series > introduces a new static calibration variant table based lookup. The newly > introduced helper, ath_get_calib_variant() will parse the model name from > devicetree and use it to do the variant lookup during runtime. The > ath_calib_variant_table[] will hold all the model and calibration variant > entries for the supported devices. > > Going forward, new entries will be added to this table to support calibration > variants. > > Signed-off-by: Manivannan Sadhasivam > --- > Manivannan Sadhasivam (2): > wifi: ath: Use static calibration variant table for devicetree platforms > dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property > > .../bindings/net/wireless/qcom,ath10k.yaml | 1 + > .../bindings/net/wireless/qcom,ath11k-pci.yaml | 3 +- > .../bindings/net/wireless/qcom,ath11k.yaml | 1 + > .../bindings/net/wireless/qcom,ath12k-wsi.yaml | 6 +- > .../bindings/net/wireless/qcom,ipq5332-wifi.yaml | 2 +- > drivers/net/wireless/ath/ath.h | 98 ++++++++++++++++++++++ > drivers/net/wireless/ath/ath10k/core.c | 5 ++ > drivers/net/wireless/ath/ath11k/core.c | 7 ++ > 8 files changed, 115 insertions(+), 8 deletions(-) > --- > base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 > change-id: 20251114-ath-variant-tbl-22865456a527 > > Best regards, From manivannan.sadhasivam at oss.qualcomm.com Mon Nov 17 01:00:40 2025 From: manivannan.sadhasivam at oss.qualcomm.com (Manivannan Sadhasivam) Date: Mon, 17 Nov 2025 14:30:40 +0530 Subject: [PATCH 0/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: <2fd84ab2-2e3e-4d05-add5-17930a35fedf@oss.qualcomm.com> References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <2fd84ab2-2e3e-4d05-add5-17930a35fedf@oss.qualcomm.com> Message-ID: On Mon, Nov 17, 2025 at 10:36:39AM +0800, Baochen Qiang wrote: > > > On 11/14/2025 6:22 PM, Manivannan Sadhasivam wrote: > > Hi, > > > > This series aims to deprecate the usage of "qcom,*calibration-variant" > > devicetree property to select the calibration variant for the WLAN devices. This > > is necessary for WLAN devices connected using PCI bus, as hardcoding the device > > specific information in PCI devicetree node causes the node to be updated every > > time when a new device variant is attached to the PCI slot. This approach is not > > scalable and causes bad user experience. > > I am not very clear about the problem here: is calibration variant device/module specific, > or platform specific? If it is module specific, why the lookup is based on the machine > 'model' property? While if it is platform specific, why do we need to update devicetree > node whenever a new device is attached? > I think I mixed the usecase of the 'firmware-name' property in the above description. But nevertheless, the calibration info platform specific, and hardcoding the DT property fixes the location of the WLAN card with a specific slot. For instance, if the board has a couple of M.2 slots, users should be free to plug the WLAN in any slot, not just a single slot where the property was defined in DT. Also, if the users plug-in the WLAN card of another vendor, not Qcom, this property is irrelevant/wrong. PCIe slots should be plug and play i.e., users should plug-in any M.2 card and expect it to work. However, as I learned from Jeff, calibration variant property is also going to be required in cases like router boards where each slot is dedicated to a fixed band and the calibration variant is going to be different for each band for the platform. So unlike I thought, this DT property cannot be deprecated. But going forward, I'd like it to be used only in these special usecases. Most of the upstream DTS have a single calibration variant for the platform and for those generic usecases, this static table should be used. - Mani > > > > So to avoid relying on the "qcom,*calibration-variant" property, this series > > introduces a new static calibration variant table based lookup. The newly > > introduced helper, ath_get_calib_variant() will parse the model name from > > devicetree and use it to do the variant lookup during runtime. The > > ath_calib_variant_table[] will hold all the model and calibration variant > > entries for the supported devices. > > > > Going forward, new entries will be added to this table to support calibration > > variants. > > > > Signed-off-by: Manivannan Sadhasivam > > --- > > Manivannan Sadhasivam (2): > > wifi: ath: Use static calibration variant table for devicetree platforms > > dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property > > > > .../bindings/net/wireless/qcom,ath10k.yaml | 1 + > > .../bindings/net/wireless/qcom,ath11k-pci.yaml | 3 +- > > .../bindings/net/wireless/qcom,ath11k.yaml | 1 + > > .../bindings/net/wireless/qcom,ath12k-wsi.yaml | 6 +- > > .../bindings/net/wireless/qcom,ipq5332-wifi.yaml | 2 +- > > drivers/net/wireless/ath/ath.h | 98 ++++++++++++++++++++++ > > drivers/net/wireless/ath/ath10k/core.c | 5 ++ > > drivers/net/wireless/ath/ath11k/core.c | 7 ++ > > 8 files changed, 115 insertions(+), 8 deletions(-) > > --- > > base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 > > change-id: 20251114-ath-variant-tbl-22865456a527 > > > > Best regards, > -- ????????? ???????? From manivannan.sadhasivam at oss.qualcomm.com Mon Nov 17 01:03:10 2025 From: manivannan.sadhasivam at oss.qualcomm.com (Manivannan Sadhasivam) Date: Mon, 17 Nov 2025 14:33:10 +0530 Subject: [PATCH 2/2] dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property In-Reply-To: <7757501b-2576-4f5d-a16a-40e06f12cb5f@oss.qualcomm.com> References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <20251114-ath-variant-tbl-v1-2-a9adfc49e3f3@oss.qualcomm.com> <1703d8d7-5105-4585-b8f0-82bb54809718@kernel.org> <7757501b-2576-4f5d-a16a-40e06f12cb5f@oss.qualcomm.com> Message-ID: On Fri, Nov 14, 2025 at 09:29:46AM -0800, Jeff Johnson wrote: > On 11/14/2025 3:18 AM, Manivannan Sadhasivam wrote: > > On Fri, Nov 14, 2025 at 12:04:55PM +0100, Krzysztof Kozlowski wrote: > >> On 14/11/2025 12:02, Manivannan Sadhasivam wrote: > >>> On Fri, Nov 14, 2025 at 11:47:25AM +0100, Krzysztof Kozlowski wrote: > >>>> On 14/11/2025 11:22, Manivannan Sadhasivam wrote: > >>>>> On devicetree platforms, ath{10k/11k} drivers rely on the presence of the > >>>>> 'qcom,calibration-variant' property to select the correct calibration data > >>>>> for device variants with colliding IDs. > >>>>> > >>>>> But this property based selection has its own downside that it needs to be > >>>>> added to the devicetree node of the WLAN device, especially for PCI based > >>>>> devices. Currently, the users/vendors are forced to hardcode this property > >>>>> in the PCI device node. If a different device need to be attached to the > >>>>> slot, then the devicetree node also has to be changed. This approach is not > >>>>> scalable and creates a bad user experience. > >>>>> > >>>>> So deprecate this property from WLAN devicetree nodes and let the drivers > >>>>> do the devicetree model based calibration variant lookup using a static > >>>>> table. > >>>>> > >>>>> This also warrants removing the property from examples in the binding. > >>>>> > >>>>> Signed-off-by: Manivannan Sadhasivam > >>>>> --- > >>>> > >>>> The problem - visible in one of the examples here - is that one board > >>>> has multiple WiFi chips and they use different calibration-variant > >>>> properties. How do you find the right calibration variant for such case > >>>> based on board machine match? > >>>> > >>> > >>> I suspect the legitimacy of the example here. I don't understand how a single > >>> machine can have same devices with 3 different calibration data. > >> > >> Me neither but I am not the domain expert here. > >> > >>> > >>> AFAIU, calibration data is specific to the platform design. And I don't see any > >>> upstream supported devicetree having similar properties. > >> Deprecating these is fine with me, but I would prefer if we get here > >> some clear answers that mentioned case cannot happen. If you are sure of > >> that, please mention it in commit msg. > >> > > > > I'm pretty sure that this example is wrong. But I will wait for Jeff or other > > ath developers to confirm. > > As discussed privately this is a valid example. This is a single-band chip. So > a tri-band router platform will have 3 boards, one that is supporting 2 GHz, > one supporting 5 GHz, and one supporting 6 GHz, and each frequency range will > have different calibration data. > > So we still need to support slot-specific configuration in cases where the > slot to board mapping really is fixed in the platform. > Thanks for letting me know of the multi-band usecase, which I was not aware of. Yes, this property has to be used for those special usecases, so we cannot deprecate it. But going forward, for the single calibration data usecase (like almost all upstream DTS), this static table should be used. - Mani -- ????????? ???????? From baochen.qiang at oss.qualcomm.com Mon Nov 17 01:40:06 2025 From: baochen.qiang at oss.qualcomm.com (Baochen Qiang) Date: Mon, 17 Nov 2025 17:40:06 +0800 Subject: [PATCH 0/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <2fd84ab2-2e3e-4d05-add5-17930a35fedf@oss.qualcomm.com> Message-ID: On 11/17/2025 5:00 PM, Manivannan Sadhasivam wrote: > On Mon, Nov 17, 2025 at 10:36:39AM +0800, Baochen Qiang wrote: >> >> >> On 11/14/2025 6:22 PM, Manivannan Sadhasivam wrote: >>> Hi, >>> >>> This series aims to deprecate the usage of "qcom,*calibration-variant" >>> devicetree property to select the calibration variant for the WLAN devices. This >>> is necessary for WLAN devices connected using PCI bus, as hardcoding the device >>> specific information in PCI devicetree node causes the node to be updated every >>> time when a new device variant is attached to the PCI slot. This approach is not >>> scalable and causes bad user experience. >> >> I am not very clear about the problem here: is calibration variant device/module specific, >> or platform specific? If it is module specific, why the lookup is based on the machine >> 'model' property? While if it is platform specific, why do we need to update devicetree >> node whenever a new device is attached? >> > > I think I mixed the usecase of the 'firmware-name' property in the above > description. > > But nevertheless, the calibration info platform specific, and hardcoding the DT > property fixes the location of the WLAN card with a specific slot. For instance, > if the board has a couple of M.2 slots, users should be free to plug the WLAN in > any slot, not just a single slot where the property was defined in DT. > > Also, if the users plug-in the WLAN card of another vendor, not Qcom, this > property is irrelevant/wrong. > > PCIe slots should be plug and play i.e., users should plug-in any M.2 card and > expect it to work. > correct > However, as I learned from Jeff, calibration variant property is also going to > be required in cases like router boards where each slot is dedicated to a fixed > band and the calibration variant is going to be different for each band for the > platform. So unlike I thought, this DT property cannot be deprecated. But going > forward, I'd like it to be used only in these special usecases. Most of the > upstream DTS have a single calibration variant for the platform and for those > generic usecases, this static table should be used. If that property is not going to be deprecated, should it take precedence? > > - Mani > >>> >>> So to avoid relying on the "qcom,*calibration-variant" property, this series >>> introduces a new static calibration variant table based lookup. The newly >>> introduced helper, ath_get_calib_variant() will parse the model name from >>> devicetree and use it to do the variant lookup during runtime. The >>> ath_calib_variant_table[] will hold all the model and calibration variant >>> entries for the supported devices. >>> >>> Going forward, new entries will be added to this table to support calibration >>> variants. >>> >>> Signed-off-by: Manivannan Sadhasivam >>> --- >>> Manivannan Sadhasivam (2): >>> wifi: ath: Use static calibration variant table for devicetree platforms >>> dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property >>> >>> .../bindings/net/wireless/qcom,ath10k.yaml | 1 + >>> .../bindings/net/wireless/qcom,ath11k-pci.yaml | 3 +- >>> .../bindings/net/wireless/qcom,ath11k.yaml | 1 + >>> .../bindings/net/wireless/qcom,ath12k-wsi.yaml | 6 +- >>> .../bindings/net/wireless/qcom,ipq5332-wifi.yaml | 2 +- >>> drivers/net/wireless/ath/ath.h | 98 ++++++++++++++++++++++ >>> drivers/net/wireless/ath/ath10k/core.c | 5 ++ >>> drivers/net/wireless/ath/ath11k/core.c | 7 ++ >>> 8 files changed, 115 insertions(+), 8 deletions(-) >>> --- >>> base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 >>> change-id: 20251114-ath-variant-tbl-22865456a527 >>> >>> Best regards, >> > From manivannan.sadhasivam at oss.qualcomm.com Mon Nov 17 04:45:20 2025 From: manivannan.sadhasivam at oss.qualcomm.com (Manivannan Sadhasivam) Date: Mon, 17 Nov 2025 18:15:20 +0530 Subject: [PATCH 0/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <2fd84ab2-2e3e-4d05-add5-17930a35fedf@oss.qualcomm.com> Message-ID: On Mon, Nov 17, 2025 at 05:40:06PM +0800, Baochen Qiang wrote: > > > On 11/17/2025 5:00 PM, Manivannan Sadhasivam wrote: > > On Mon, Nov 17, 2025 at 10:36:39AM +0800, Baochen Qiang wrote: > >> > >> > >> On 11/14/2025 6:22 PM, Manivannan Sadhasivam wrote: > >>> Hi, > >>> > >>> This series aims to deprecate the usage of "qcom,*calibration-variant" > >>> devicetree property to select the calibration variant for the WLAN devices. This > >>> is necessary for WLAN devices connected using PCI bus, as hardcoding the device > >>> specific information in PCI devicetree node causes the node to be updated every > >>> time when a new device variant is attached to the PCI slot. This approach is not > >>> scalable and causes bad user experience. > >> > >> I am not very clear about the problem here: is calibration variant device/module specific, > >> or platform specific? If it is module specific, why the lookup is based on the machine > >> 'model' property? While if it is platform specific, why do we need to update devicetree > >> node whenever a new device is attached? > >> > > > > I think I mixed the usecase of the 'firmware-name' property in the above > > description. > > > > But nevertheless, the calibration info platform specific, and hardcoding the DT > > property fixes the location of the WLAN card with a specific slot. For instance, > > if the board has a couple of M.2 slots, users should be free to plug the WLAN in > > any slot, not just a single slot where the property was defined in DT. > > > > Also, if the users plug-in the WLAN card of another vendor, not Qcom, this > > property is irrelevant/wrong. > > > > PCIe slots should be plug and play i.e., users should plug-in any M.2 card and > > expect it to work. > > > > correct > > > However, as I learned from Jeff, calibration variant property is also going to > > be required in cases like router boards where each slot is dedicated to a fixed > > band and the calibration variant is going to be different for each band for the > > platform. So unlike I thought, this DT property cannot be deprecated. But going > > forward, I'd like it to be used only in these special usecases. Most of the > > upstream DTS have a single calibration variant for the platform and for those > > generic usecases, this static table should be used. > > If that property is not going to be deprecated, should it take precedence? > If you mean 'it' by this static table, yes, it is going to take precedence as it should cover the generic usecases. For special cases like the multi-band routers, existing DT node fallback will cover. - Mani > > > > - Mani > > > >>> > >>> So to avoid relying on the "qcom,*calibration-variant" property, this series > >>> introduces a new static calibration variant table based lookup. The newly > >>> introduced helper, ath_get_calib_variant() will parse the model name from > >>> devicetree and use it to do the variant lookup during runtime. The > >>> ath_calib_variant_table[] will hold all the model and calibration variant > >>> entries for the supported devices. > >>> > >>> Going forward, new entries will be added to this table to support calibration > >>> variants. > >>> > >>> Signed-off-by: Manivannan Sadhasivam > >>> --- > >>> Manivannan Sadhasivam (2): > >>> wifi: ath: Use static calibration variant table for devicetree platforms > >>> dt-bindings: wireless: ath: Deprecate 'qcom,calibration-variant' property > >>> > >>> .../bindings/net/wireless/qcom,ath10k.yaml | 1 + > >>> .../bindings/net/wireless/qcom,ath11k-pci.yaml | 3 +- > >>> .../bindings/net/wireless/qcom,ath11k.yaml | 1 + > >>> .../bindings/net/wireless/qcom,ath12k-wsi.yaml | 6 +- > >>> .../bindings/net/wireless/qcom,ipq5332-wifi.yaml | 2 +- > >>> drivers/net/wireless/ath/ath.h | 98 ++++++++++++++++++++++ > >>> drivers/net/wireless/ath/ath10k/core.c | 5 ++ > >>> drivers/net/wireless/ath/ath11k/core.c | 7 ++ > >>> 8 files changed, 115 insertions(+), 8 deletions(-) > >>> --- > >>> base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 > >>> change-id: 20251114-ath-variant-tbl-22865456a527 > >>> > >>> Best regards, > >> > > -- ????????? ???????? From jeff.johnson at oss.qualcomm.com Mon Nov 17 09:13:04 2025 From: jeff.johnson at oss.qualcomm.com (Jeff Johnson) Date: Mon, 17 Nov 2025 09:13:04 -0800 Subject: [PATCH 0/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <2fd84ab2-2e3e-4d05-add5-17930a35fedf@oss.qualcomm.com> Message-ID: On 11/17/2025 4:45 AM, Manivannan Sadhasivam wrote: > On Mon, Nov 17, 2025 at 05:40:06PM +0800, Baochen Qiang wrote: >> On 11/17/2025 5:00 PM, Manivannan Sadhasivam wrote: >>> On Mon, Nov 17, 2025 at 10:36:39AM +0800, Baochen Qiang wrote: >>>> On 11/14/2025 6:22 PM, Manivannan Sadhasivam wrote: >>>>> Hi, >>>>> >>>>> This series aims to deprecate the usage of "qcom,*calibration-variant" >>>>> devicetree property to select the calibration variant for the WLAN devices. This >>>>> is necessary for WLAN devices connected using PCI bus, as hardcoding the device >>>>> specific information in PCI devicetree node causes the node to be updated every >>>>> time when a new device variant is attached to the PCI slot. This approach is not >>>>> scalable and causes bad user experience. >>>> >>>> I am not very clear about the problem here: is calibration variant device/module specific, >>>> or platform specific? If it is module specific, why the lookup is based on the machine >>>> 'model' property? While if it is platform specific, why do we need to update devicetree >>>> node whenever a new device is attached? >>>> >>> >>> I think I mixed the usecase of the 'firmware-name' property in the above >>> description. >>> >>> But nevertheless, the calibration info platform specific, and hardcoding the DT >>> property fixes the location of the WLAN card with a specific slot. For instance, >>> if the board has a couple of M.2 slots, users should be free to plug the WLAN in >>> any slot, not just a single slot where the property was defined in DT. >>> >>> Also, if the users plug-in the WLAN card of another vendor, not Qcom, this >>> property is irrelevant/wrong. >>> >>> PCIe slots should be plug and play i.e., users should plug-in any M.2 card and >>> expect it to work. >>> >> >> correct >> >>> However, as I learned from Jeff, calibration variant property is also going to >>> be required in cases like router boards where each slot is dedicated to a fixed >>> band and the calibration variant is going to be different for each band for the >>> platform. So unlike I thought, this DT property cannot be deprecated. But going >>> forward, I'd like it to be used only in these special usecases. Most of the >>> upstream DTS have a single calibration variant for the platform and for those >>> generic usecases, this static table should be used. >> >> If that property is not going to be deprecated, should it take precedence? >> > > If you mean 'it' by this static table, yes, it is going to take precedence as it > should cover the generic usecases. For special cases like the multi-band > routers, existing DT node fallback will cover. Does there need to be a PCI Vendor ID & Device ID as part of this lookup? For example, start with a device that has an ath11k chipset with calibration data for that chipset. If the end user replaces that chipset with an ath12k chipset then with the current proposal the same calibration variant will attempt to be used. But there will not be any calibration data with that variant for that chipset. /jeff From manivannan.sadhasivam at oss.qualcomm.com Mon Nov 17 22:53:20 2025 From: manivannan.sadhasivam at oss.qualcomm.com (Manivannan Sadhasivam) Date: Tue, 18 Nov 2025 12:23:20 +0530 Subject: [PATCH 0/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <2fd84ab2-2e3e-4d05-add5-17930a35fedf@oss.qualcomm.com> Message-ID: On Mon, Nov 17, 2025 at 09:13:04AM -0800, Jeff Johnson wrote: > On 11/17/2025 4:45 AM, Manivannan Sadhasivam wrote: > > On Mon, Nov 17, 2025 at 05:40:06PM +0800, Baochen Qiang wrote: > >> On 11/17/2025 5:00 PM, Manivannan Sadhasivam wrote: > >>> On Mon, Nov 17, 2025 at 10:36:39AM +0800, Baochen Qiang wrote: > >>>> On 11/14/2025 6:22 PM, Manivannan Sadhasivam wrote: > >>>>> Hi, > >>>>> > >>>>> This series aims to deprecate the usage of "qcom,*calibration-variant" > >>>>> devicetree property to select the calibration variant for the WLAN devices. This > >>>>> is necessary for WLAN devices connected using PCI bus, as hardcoding the device > >>>>> specific information in PCI devicetree node causes the node to be updated every > >>>>> time when a new device variant is attached to the PCI slot. This approach is not > >>>>> scalable and causes bad user experience. > >>>> > >>>> I am not very clear about the problem here: is calibration variant device/module specific, > >>>> or platform specific? If it is module specific, why the lookup is based on the machine > >>>> 'model' property? While if it is platform specific, why do we need to update devicetree > >>>> node whenever a new device is attached? > >>>> > >>> > >>> I think I mixed the usecase of the 'firmware-name' property in the above > >>> description. > >>> > >>> But nevertheless, the calibration info platform specific, and hardcoding the DT > >>> property fixes the location of the WLAN card with a specific slot. For instance, > >>> if the board has a couple of M.2 slots, users should be free to plug the WLAN in > >>> any slot, not just a single slot where the property was defined in DT. > >>> > >>> Also, if the users plug-in the WLAN card of another vendor, not Qcom, this > >>> property is irrelevant/wrong. > >>> > >>> PCIe slots should be plug and play i.e., users should plug-in any M.2 card and > >>> expect it to work. > >>> > >> > >> correct > >> > >>> However, as I learned from Jeff, calibration variant property is also going to > >>> be required in cases like router boards where each slot is dedicated to a fixed > >>> band and the calibration variant is going to be different for each band for the > >>> platform. So unlike I thought, this DT property cannot be deprecated. But going > >>> forward, I'd like it to be used only in these special usecases. Most of the > >>> upstream DTS have a single calibration variant for the platform and for those > >>> generic usecases, this static table should be used. > >> > >> If that property is not going to be deprecated, should it take precedence? > >> > > > > If you mean 'it' by this static table, yes, it is going to take precedence as it > > should cover the generic usecases. For special cases like the multi-band > > routers, existing DT node fallback will cover. > Does there need to be a PCI Vendor ID & Device ID as part of this lookup? > I don't think so. > For example, start with a device that has an ath11k chipset with calibration > data for that chipset. If the end user replaces that chipset with an ath12k > chipset then with the current proposal the same calibration variant will > attempt to be used. But there will not be any calibration data with that > variant for that chipset. > ath12k doesn't seem to require a calibration variant. But even if the user replaces ath11k chipset with ath10k one, the calibration variant should be the same as it is platform specific except for WSI. - Mani -- ????????? ???????? From W_Armin at gmx.de Wed Nov 19 19:41:11 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Thu, 20 Nov 2025 04:41:11 +0100 Subject: [PATCH RFC RESEND 1/8] thermal: core: Allow setting the parent device of cooling devices In-Reply-To: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> Message-ID: <20251120-thermal-device-v1-1-bbdad594d57a@gmx.de> Currently, cooling devices have no parent device, potentially causing issues with suspend ordering and making it impossible for consumers (thermal zones and userspace appications) to associate a given cooling device with its parent device. Extend __thermal_cooling_device_register() to also accept a parent device pointer. For now only devm_thermal_of_cooling_device_register() uses this, as the other wrapper functions need to be extended first. Signed-off-by: Armin Wolf --- drivers/thermal/thermal_core.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 17ca5c082643..c8b720194b44 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1040,6 +1040,7 @@ static void thermal_cooling_device_init_complete(struct thermal_cooling_device * /** * __thermal_cooling_device_register() - register a new thermal cooling device + * @parent: parent device pointer. * @np: a pointer to a device tree node. * @type: the thermal cooling device type. * @devdata: device private data. @@ -1055,7 +1056,7 @@ static void thermal_cooling_device_init_complete(struct thermal_cooling_device * * ERR_PTR. Caller must check return value with IS_ERR*() helpers. */ static struct thermal_cooling_device * -__thermal_cooling_device_register(struct device_node *np, +__thermal_cooling_device_register(struct device *parent, struct device_node *np, const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { @@ -1092,6 +1093,7 @@ __thermal_cooling_device_register(struct device_node *np, cdev->ops = ops; cdev->updated = false; cdev->device.class = thermal_class; + cdev->device.parent = parent; cdev->devdata = devdata; ret = cdev->ops->get_max_state(cdev, &cdev->max_state); @@ -1158,7 +1160,7 @@ struct thermal_cooling_device * thermal_cooling_device_register(const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { - return __thermal_cooling_device_register(NULL, type, devdata, ops); + return __thermal_cooling_device_register(NULL, NULL, type, devdata, ops); } EXPORT_SYMBOL_GPL(thermal_cooling_device_register); @@ -1182,7 +1184,7 @@ thermal_of_cooling_device_register(struct device_node *np, const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { - return __thermal_cooling_device_register(np, type, devdata, ops); + return __thermal_cooling_device_register(NULL, np, type, devdata, ops); } EXPORT_SYMBOL_GPL(thermal_of_cooling_device_register); @@ -1222,7 +1224,7 @@ devm_thermal_of_cooling_device_register(struct device *dev, if (!ptr) return ERR_PTR(-ENOMEM); - tcd = __thermal_cooling_device_register(np, type, devdata, ops); + tcd = __thermal_cooling_device_register(dev, np, type, devdata, ops); if (IS_ERR(tcd)) { devres_free(ptr); return tcd; -- 2.39.5 From W_Armin at gmx.de Wed Nov 19 19:41:10 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Thu, 20 Nov 2025 04:41:10 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices Message-ID: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> Drivers registering thermal zone/cooling devices are currently unable to tell the thermal core what parent device the new thermal zone/ cooling device should have, potentially causing issues with suspend ordering and making it impossible for user space appications to associate a given thermal zone device with its parent device. This patch series aims to fix this issue by extending the functions used to register thermal zone/cooling devices to also accept a parent device pointer. The first six patches convert all functions used for registering cooling devices, while the functions used for registering thermal zone devices are converted by the remaining two patches. I tested this series on various devices containing (among others): - ACPI thermal zones - ACPI processor devices - PCIe cooling devices - Intel Wifi card - Intel powerclamp - Intel TCC cooling I also compile-tested the remaining affected drivers, however i would still be happy if the relevant maintainers (especially those of the mellanox ethernet switch driver) could take a quick glance at the code and verify that i am using the correct device as the parent device. This work is also necessary for extending the ACPI thermal zone driver to support the _TZD ACPI object in the future. Signed-off-by: Armin Wolf --- Armin Wolf (8): thermal: core: Allow setting the parent device of cooling devices thermal: core: Set parent device in thermal_of_cooling_device_register() ACPI: processor: Stop creating "device" sysfs link ACPI: fan: Stop creating "device" sysfs link ACPI: video: Stop creating "device" sysfs link thermal: core: Set parent device in thermal_cooling_device_register() ACPI: thermal: Stop creating "device" sysfs link thermal: core: Allow setting the parent device of thermal zone devices Documentation/driver-api/thermal/sysfs-api.rst | 10 ++++- drivers/acpi/acpi_video.c | 9 +---- drivers/acpi/fan_core.c | 16 ++------ drivers/acpi/processor_thermal.c | 15 +------ drivers/acpi/thermal.c | 33 ++++++--------- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c | 4 +- drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 47 +++++++++++----------- drivers/net/wireless/ath/ath10k/thermal.c | 2 +- drivers/net/wireless/ath/ath11k/thermal.c | 2 +- drivers/net/wireless/intel/iwlwifi/mld/thermal.c | 6 +-- drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 12 +++--- drivers/net/wireless/mediatek/mt76/mt7915/init.c | 2 +- drivers/net/wireless/mediatek/mt76/mt7996/init.c | 2 +- drivers/platform/x86/acerhdf.c | 4 +- drivers/power/supply/power_supply_core.c | 4 +- drivers/thermal/armada_thermal.c | 2 +- drivers/thermal/cpufreq_cooling.c | 2 +- drivers/thermal/cpuidle_cooling.c | 2 +- drivers/thermal/da9062-thermal.c | 2 +- drivers/thermal/devfreq_cooling.c | 2 +- drivers/thermal/dove_thermal.c | 2 +- drivers/thermal/imx_thermal.c | 2 +- .../intel/int340x_thermal/int3400_thermal.c | 2 +- .../intel/int340x_thermal/int3403_thermal.c | 4 +- .../intel/int340x_thermal/int3406_thermal.c | 2 +- .../intel/int340x_thermal/int340x_thermal_zone.c | 13 +++--- .../int340x_thermal/processor_thermal_device_pci.c | 7 ++-- drivers/thermal/intel/intel_pch_thermal.c | 2 +- drivers/thermal/intel/intel_powerclamp.c | 2 +- drivers/thermal/intel/intel_quark_dts_thermal.c | 2 +- drivers/thermal/intel/intel_soc_dts_iosf.c | 2 +- drivers/thermal/intel/intel_tcc_cooling.c | 2 +- drivers/thermal/intel/x86_pkg_temp_thermal.c | 6 +-- drivers/thermal/kirkwood_thermal.c | 2 +- drivers/thermal/pcie_cooling.c | 2 +- drivers/thermal/renesas/rcar_thermal.c | 10 +++-- drivers/thermal/spear_thermal.c | 2 +- drivers/thermal/tegra/soctherm.c | 5 +-- drivers/thermal/testing/zone.c | 2 +- drivers/thermal/thermal_core.c | 23 +++++++---- drivers/thermal/thermal_of.c | 9 +++-- include/linux/thermal.h | 22 +++++----- 43 files changed, 145 insertions(+), 162 deletions(-) --- base-commit: 653ef66b2c04bcdecaf3d13ea5069c4b1f27d5da change-id: 20251114-thermal-device-655d138824c6 Best regards, -- Armin Wolf From W_Armin at gmx.de Wed Nov 19 19:41:13 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Thu, 20 Nov 2025 04:41:13 +0100 Subject: [PATCH RFC RESEND 3/8] ACPI: processor: Stop creating "device" sysfs link In-Reply-To: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> Message-ID: <20251120-thermal-device-v1-3-bbdad594d57a@gmx.de> The thermal core will soon automatically create sysfs links between the cooling device and its parent device. Stop manually creating the "device" sysfs link between the cooling device and the parent device to avoid a name collision. The "thermal_cooling" sysfs link however stays for backwards compatibility, as it does not suffer from a name collision. Signed-off-by: Armin Wolf --- drivers/acpi/processor_thermal.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/drivers/acpi/processor_thermal.c b/drivers/acpi/processor_thermal.c index c7b1dc5687ec..1ff10321eac5 100644 --- a/drivers/acpi/processor_thermal.c +++ b/drivers/acpi/processor_thermal.c @@ -323,6 +323,7 @@ int acpi_processor_thermal_init(struct acpi_processor *pr, dev_dbg(&device->dev, "registered as cooling_device%d\n", pr->cdev->id); + /* For backwards compatibility */ result = sysfs_create_link(&device->dev.kobj, &pr->cdev->device.kobj, "thermal_cooling"); @@ -332,19 +333,8 @@ int acpi_processor_thermal_init(struct acpi_processor *pr, goto err_thermal_unregister; } - result = sysfs_create_link(&pr->cdev->device.kobj, - &device->dev.kobj, - "device"); - if (result) { - dev_err(&pr->cdev->device, - "Failed to create sysfs link 'device'\n"); - goto err_remove_sysfs_thermal; - } - return 0; -err_remove_sysfs_thermal: - sysfs_remove_link(&device->dev.kobj, "thermal_cooling"); err_thermal_unregister: thermal_cooling_device_unregister(pr->cdev); @@ -356,7 +346,6 @@ void acpi_processor_thermal_exit(struct acpi_processor *pr, { if (pr->cdev) { sysfs_remove_link(&device->dev.kobj, "thermal_cooling"); - sysfs_remove_link(&pr->cdev->device.kobj, "device"); thermal_cooling_device_unregister(pr->cdev); pr->cdev = NULL; } -- 2.39.5 From W_Armin at gmx.de Wed Nov 19 19:41:14 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Thu, 20 Nov 2025 04:41:14 +0100 Subject: [PATCH RFC RESEND 4/8] ACPI: fan: Stop creating "device" sysfs link In-Reply-To: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> Message-ID: <20251120-thermal-device-v1-4-bbdad594d57a@gmx.de> The thermal core will soon automatically create sysfs links between the cooling device and its parent device. Stop manually creating the "device" sysfs link between the cooling device and the parent device to avoid a name collision. The "thermal_cooling" sysfs link however stays for backwards compatibility, as it does not suffer from a name collision. Signed-off-by: Armin Wolf --- drivers/acpi/fan_core.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/acpi/fan_core.c b/drivers/acpi/fan_core.c index fb08b8549ed7..2ca3e347f15c 100644 --- a/drivers/acpi/fan_core.c +++ b/drivers/acpi/fan_core.c @@ -594,6 +594,7 @@ static int acpi_fan_probe(struct platform_device *pdev) dev_dbg(&pdev->dev, "registered as cooling_device%d\n", cdev->id); fan->cdev = cdev; + /* For backwards compatibility */ result = sysfs_create_link(&pdev->dev.kobj, &cdev->device.kobj, "thermal_cooling"); @@ -602,18 +603,8 @@ static int acpi_fan_probe(struct platform_device *pdev) goto err_unregister; } - result = sysfs_create_link(&cdev->device.kobj, - &pdev->dev.kobj, - "device"); - if (result) { - dev_err(&pdev->dev, "Failed to create sysfs link 'device'\n"); - goto err_remove_link; - } - return 0; -err_remove_link: - sysfs_remove_link(&pdev->dev.kobj, "thermal_cooling"); err_unregister: thermal_cooling_device_unregister(cdev); err_end: @@ -633,7 +624,6 @@ static void acpi_fan_remove(struct platform_device *pdev) acpi_fan_delete_attributes(device); } sysfs_remove_link(&pdev->dev.kobj, "thermal_cooling"); - sysfs_remove_link(&fan->cdev->device.kobj, "device"); thermal_cooling_device_unregister(fan->cdev); } -- 2.39.5 From W_Armin at gmx.de Wed Nov 19 19:41:12 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Thu, 20 Nov 2025 04:41:12 +0100 Subject: [PATCH RFC RESEND 2/8] thermal: core: Set parent device in thermal_of_cooling_device_register() In-Reply-To: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> Message-ID: <20251120-thermal-device-v1-2-bbdad594d57a@gmx.de> Extend thermal_of_cooling_device_register() to allow users to specify the parent device of the cooling device to be created. Signed-off-by: Armin Wolf --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 ++-- drivers/thermal/cpufreq_cooling.c | 2 +- drivers/thermal/cpuidle_cooling.c | 2 +- drivers/thermal/devfreq_cooling.c | 2 +- drivers/thermal/tegra/soctherm.c | 5 ++--- drivers/thermal/thermal_core.c | 5 +++-- include/linux/thermal.h | 9 ++++----- 7 files changed, 14 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index cf0d9049bcf1..f2c98e46a1c6 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -1778,8 +1778,8 @@ static int etnaviv_gpu_bind(struct device *dev, struct device *master, int ret; if (IS_ENABLED(CONFIG_DRM_ETNAVIV_THERMAL)) { - gpu->cooling = thermal_of_cooling_device_register(dev->of_node, - (char *)dev_name(dev), gpu, &cooling_ops); + gpu->cooling = thermal_of_cooling_device_register(dev, dev->of_node, dev_name(dev), + gpu, &cooling_ops); if (IS_ERR(gpu->cooling)) return PTR_ERR(gpu->cooling); } diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c index 6b7ab1814c12..af9250c44da7 100644 --- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -593,7 +593,7 @@ __cpufreq_cooling_register(struct device_node *np, if (!name) goto remove_qos_req; - cdev = thermal_of_cooling_device_register(np, name, cpufreq_cdev, + cdev = thermal_of_cooling_device_register(dev, np, name, cpufreq_cdev, cooling_ops); kfree(name); diff --git a/drivers/thermal/cpuidle_cooling.c b/drivers/thermal/cpuidle_cooling.c index f678c1281862..520c89a36d90 100644 --- a/drivers/thermal/cpuidle_cooling.c +++ b/drivers/thermal/cpuidle_cooling.c @@ -207,7 +207,7 @@ static int __cpuidle_cooling_register(struct device_node *np, goto out_unregister; } - cdev = thermal_of_cooling_device_register(np, name, idle_cdev, + cdev = thermal_of_cooling_device_register(dev, np, name, idle_cdev, &cpuidle_cooling_ops); if (IS_ERR(cdev)) { ret = PTR_ERR(cdev); diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c index 8fd7cf1932cd..d91695ed0f26 100644 --- a/drivers/thermal/devfreq_cooling.c +++ b/drivers/thermal/devfreq_cooling.c @@ -454,7 +454,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df, if (!name) goto remove_qos_req; - cdev = thermal_of_cooling_device_register(np, name, dfc, ops); + cdev = thermal_of_cooling_device_register(dev, np, name, dfc, ops); kfree(name); if (IS_ERR(cdev)) { diff --git a/drivers/thermal/tegra/soctherm.c b/drivers/thermal/tegra/soctherm.c index 5d26b52beaba..4f43da123be4 100644 --- a/drivers/thermal/tegra/soctherm.c +++ b/drivers/thermal/tegra/soctherm.c @@ -1700,9 +1700,8 @@ static void soctherm_init_hw_throt_cdev(struct platform_device *pdev) stc->init = true; } else { - tcd = thermal_of_cooling_device_register(np_stcc, - (char *)name, ts, - &throt_cooling_ops); + tcd = thermal_of_cooling_device_register(dev, np_stcc, name, ts, + &throt_cooling_ops); if (IS_ERR_OR_NULL(tcd)) { dev_err(dev, "throttle-cfg: %s: failed to register cooling device\n", diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index c8b720194b44..5d752e712cc0 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1166,6 +1166,7 @@ EXPORT_SYMBOL_GPL(thermal_cooling_device_register); /** * thermal_of_cooling_device_register() - register an OF thermal cooling device + * @parent: parent device pointer. * @np: a pointer to a device tree node. * @type: the thermal cooling device type. * @devdata: device private data. @@ -1180,11 +1181,11 @@ EXPORT_SYMBOL_GPL(thermal_cooling_device_register); * ERR_PTR. Caller must check return value with IS_ERR*() helpers. */ struct thermal_cooling_device * -thermal_of_cooling_device_register(struct device_node *np, +thermal_of_cooling_device_register(struct device *parent, struct device_node *np, const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { - return __thermal_cooling_device_register(NULL, np, type, devdata, ops); + return __thermal_cooling_device_register(parent, np, type, devdata, ops); } EXPORT_SYMBOL_GPL(thermal_of_cooling_device_register); diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 0b5ed6821080..fa53d12173ce 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -253,8 +253,8 @@ void thermal_zone_device_update(struct thermal_zone_device *, struct thermal_cooling_device *thermal_cooling_device_register(const char *, void *, const struct thermal_cooling_device_ops *); struct thermal_cooling_device * -thermal_of_cooling_device_register(struct device_node *np, const char *, void *, - const struct thermal_cooling_device_ops *); +thermal_of_cooling_device_register(struct device *parent, struct device_node *np, const char *type, + void *devdata, const struct thermal_cooling_device_ops *); struct thermal_cooling_device * devm_thermal_of_cooling_device_register(struct device *dev, struct device_node *np, @@ -302,9 +302,8 @@ thermal_cooling_device_register(const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { return ERR_PTR(-ENODEV); } static inline struct thermal_cooling_device * -thermal_of_cooling_device_register(struct device_node *np, - const char *type, void *devdata, - const struct thermal_cooling_device_ops *ops) +thermal_of_cooling_device_register(struct device *parent, struct device_node *np, const char *type, + void *devdata, const struct thermal_cooling_device_ops *ops) { return ERR_PTR(-ENODEV); } static inline struct thermal_cooling_device * devm_thermal_of_cooling_device_register(struct device *dev, -- 2.39.5 From W_Armin at gmx.de Wed Nov 19 19:41:15 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Thu, 20 Nov 2025 04:41:15 +0100 Subject: [PATCH RFC RESEND 5/8] ACPI: video: Stop creating "device" sysfs link In-Reply-To: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> Message-ID: <20251120-thermal-device-v1-5-bbdad594d57a@gmx.de> The thermal core will soon automatically create sysfs links between the cooling device and its parent device. Stop manually creating the "device" sysfs link between the cooling device and the parent device to avoid a name collision. The "thermal_cooling" sysfs link however stays for backwards compatibility, as it does not suffer from a name collision. Signed-off-by: Armin Wolf --- drivers/acpi/acpi_video.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c index be8e7e18abca..658e11745523 100644 --- a/drivers/acpi/acpi_video.c +++ b/drivers/acpi/acpi_video.c @@ -1774,16 +1774,12 @@ static void acpi_video_dev_register_backlight(struct acpi_video_device *device) dev_info(&device->dev->dev, "registered as cooling_device%d\n", device->cooling_dev->id); + /* For backwards compatibility */ result = sysfs_create_link(&device->dev->dev.kobj, &device->cooling_dev->device.kobj, "thermal_cooling"); if (result) pr_info("sysfs link creation failed\n"); - - result = sysfs_create_link(&device->cooling_dev->device.kobj, - &device->dev->dev.kobj, "device"); - if (result) - pr_info("Reverse sysfs link creation failed\n"); } static void acpi_video_run_bcl_for_osi(struct acpi_video_bus *video) @@ -1852,7 +1848,6 @@ static void acpi_video_dev_unregister_backlight(struct acpi_video_device *device } if (device->cooling_dev) { sysfs_remove_link(&device->dev->dev.kobj, "thermal_cooling"); - sysfs_remove_link(&device->cooling_dev->device.kobj, "device"); thermal_cooling_device_unregister(device->cooling_dev); device->cooling_dev = NULL; } -- 2.39.5 From W_Armin at gmx.de Wed Nov 19 19:41:16 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Thu, 20 Nov 2025 04:41:16 +0100 Subject: [PATCH RFC RESEND 6/8] thermal: core: Set parent device in thermal_cooling_device_register() In-Reply-To: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> Message-ID: <20251120-thermal-device-v1-6-bbdad594d57a@gmx.de> Extend thermal_cooling_device_register() to allow users to specify the parent device of the cooling device to be created. Signed-off-by: Armin Wolf --- Documentation/driver-api/thermal/sysfs-api.rst | 5 ++++- drivers/acpi/acpi_video.c | 2 +- drivers/acpi/fan_core.c | 4 ++-- drivers/acpi/processor_thermal.c | 2 +- drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 2 +- drivers/net/wireless/ath/ath10k/thermal.c | 2 +- drivers/net/wireless/ath/ath11k/thermal.c | 2 +- drivers/net/wireless/intel/iwlwifi/mld/thermal.c | 4 +--- drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 2 +- drivers/net/wireless/mediatek/mt76/mt7915/init.c | 2 +- drivers/net/wireless/mediatek/mt76/mt7996/init.c | 2 +- drivers/platform/x86/acerhdf.c | 2 +- drivers/thermal/intel/int340x_thermal/int3403_thermal.c | 4 ++-- drivers/thermal/intel/int340x_thermal/int3406_thermal.c | 2 +- drivers/thermal/intel/intel_powerclamp.c | 2 +- drivers/thermal/intel/intel_tcc_cooling.c | 2 +- drivers/thermal/pcie_cooling.c | 2 +- drivers/thermal/thermal_core.c | 5 +++-- include/linux/thermal.h | 9 +++++---- 19 files changed, 30 insertions(+), 27 deletions(-) diff --git a/Documentation/driver-api/thermal/sysfs-api.rst b/Documentation/driver-api/thermal/sysfs-api.rst index f73de211bdce..cf242cd16f2e 100644 --- a/Documentation/driver-api/thermal/sysfs-api.rst +++ b/Documentation/driver-api/thermal/sysfs-api.rst @@ -215,13 +215,16 @@ temperature) and throttle appropriate devices. :: struct thermal_cooling_device - *thermal_cooling_device_register(char *name, + *thermal_cooling_device_register(struct device *parent, char *name, void *devdata, struct thermal_cooling_device_ops *) This interface function adds a new thermal cooling device (fan/processor/...) to /sys/class/thermal/ folder as `cooling_device[0-*]`. It tries to bind itself to all the thermal zone devices registered at the same time. + parent: + parent device pointer. + name: the cooling device name. devdata: diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c index 658e11745523..eae1ff9805b1 100644 --- a/drivers/acpi/acpi_video.c +++ b/drivers/acpi/acpi_video.c @@ -1759,7 +1759,7 @@ static void acpi_video_dev_register_backlight(struct acpi_video_device *device) device->backlight->props.brightness = acpi_video_get_brightness(device->backlight); - device->cooling_dev = thermal_cooling_device_register("LCD", device, + device->cooling_dev = thermal_cooling_device_register(parent, "LCD", device, &video_cooling_ops); if (IS_ERR(device->cooling_dev)) { /* diff --git a/drivers/acpi/fan_core.c b/drivers/acpi/fan_core.c index 2ca3e347f15c..7ebf2529fbfd 100644 --- a/drivers/acpi/fan_core.c +++ b/drivers/acpi/fan_core.c @@ -584,8 +584,8 @@ static int acpi_fan_probe(struct platform_device *pdev) else name = acpi_device_bid(device); - cdev = thermal_cooling_device_register(name, device, - &fan_cooling_ops); + cdev = thermal_cooling_device_register(&pdev->dev, name, device, + &fan_cooling_ops); if (IS_ERR(cdev)) { result = PTR_ERR(cdev); goto err_end; diff --git a/drivers/acpi/processor_thermal.c b/drivers/acpi/processor_thermal.c index 1ff10321eac5..a7307f5d137f 100644 --- a/drivers/acpi/processor_thermal.c +++ b/drivers/acpi/processor_thermal.c @@ -313,7 +313,7 @@ int acpi_processor_thermal_init(struct acpi_processor *pr, { int result = 0; - pr->cdev = thermal_cooling_device_register("Processor", device, + pr->cdev = thermal_cooling_device_register(&device->dev, "Processor", device, &processor_cooling_ops); if (IS_ERR(pr->cdev)) { result = PTR_ERR(pr->cdev); diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c index eac9a14a6058..1117d59b74fd 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c @@ -693,7 +693,7 @@ int mlxsw_thermal_init(struct mlxsw_core *core, mlxsw_cdev = &thermal->cdevs[i]; mlxsw_cdev->thermal = thermal; mlxsw_cdev->idx = i; - cdev = thermal_cooling_device_register("mlxsw_fan", + cdev = thermal_cooling_device_register(dev, "mlxsw_fan", mlxsw_cdev, &mlxsw_cooling_ops); if (IS_ERR(cdev)) { diff --git a/drivers/net/wireless/ath/ath10k/thermal.c b/drivers/net/wireless/ath/ath10k/thermal.c index 8b15ec07b107..16eb41b928ba 100644 --- a/drivers/net/wireless/ath/ath10k/thermal.c +++ b/drivers/net/wireless/ath/ath10k/thermal.c @@ -161,7 +161,7 @@ int ath10k_thermal_register(struct ath10k *ar) if (!test_bit(WMI_SERVICE_THERM_THROT, ar->wmi.svc_map)) return 0; - cdev = thermal_cooling_device_register("ath10k_thermal", ar, + cdev = thermal_cooling_device_register(ar->dev, "ath10k_thermal", ar, &ath10k_thermal_ops); if (IS_ERR(cdev)) { diff --git a/drivers/net/wireless/ath/ath11k/thermal.c b/drivers/net/wireless/ath/ath11k/thermal.c index 18d6eab5cce3..363697ce8641 100644 --- a/drivers/net/wireless/ath/ath11k/thermal.c +++ b/drivers/net/wireless/ath/ath11k/thermal.c @@ -172,7 +172,7 @@ int ath11k_thermal_register(struct ath11k_base *ab) if (!ar) continue; - cdev = thermal_cooling_device_register("ath11k_thermal", ar, + cdev = thermal_cooling_device_register(&ar->hw->wiphy->dev, "ath11k_thermal", ar, &ath11k_thermal_ops); if (IS_ERR(cdev)) { diff --git a/drivers/net/wireless/intel/iwlwifi/mld/thermal.c b/drivers/net/wireless/intel/iwlwifi/mld/thermal.c index f8a8c35066be..9e56e6e80ab7 100644 --- a/drivers/net/wireless/intel/iwlwifi/mld/thermal.c +++ b/drivers/net/wireless/intel/iwlwifi/mld/thermal.c @@ -366,9 +366,7 @@ static void iwl_mld_cooling_device_register(struct iwl_mld *mld) BUILD_BUG_ON(ARRAY_SIZE(name) >= THERMAL_NAME_LENGTH); mld->cooling_dev.cdev = - thermal_cooling_device_register(name, - mld, - &tcooling_ops); + thermal_cooling_device_register(mld->dev, name, mld, &tcooling_ops); if (IS_ERR(mld->cooling_dev.cdev)) { IWL_DEBUG_TEMP(mld, diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c index 53bab21ebae2..b184f08230b9 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c @@ -744,7 +744,7 @@ static void iwl_mvm_cooling_device_register(struct iwl_mvm *mvm) BUILD_BUG_ON(ARRAY_SIZE(name) >= THERMAL_NAME_LENGTH); mvm->cooling_dev.cdev = - thermal_cooling_device_register(name, + thermal_cooling_device_register(mvm->dev, name, mvm, &tcooling_ops); diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/init.c b/drivers/net/wireless/mediatek/mt76/mt7915/init.c index 5ea8b46e092e..cb08bb36f6e2 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/init.c +++ b/drivers/net/wireless/mediatek/mt76/mt7915/init.c @@ -200,7 +200,7 @@ static int mt7915_thermal_init(struct mt7915_phy *phy) if (!name) return -ENOMEM; - cdev = thermal_cooling_device_register(name, phy, &mt7915_thermal_ops); + cdev = thermal_cooling_device_register(&wiphy->dev, name, phy, &mt7915_thermal_ops); if (!IS_ERR(cdev)) { if (sysfs_create_link(&wiphy->dev.kobj, &cdev->device.kobj, "cooling_device") < 0) diff --git a/drivers/net/wireless/mediatek/mt76/mt7996/init.c b/drivers/net/wireless/mediatek/mt76/mt7996/init.c index 5e95a36b42d1..bb6e55d79d0e 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7996/init.c +++ b/drivers/net/wireless/mediatek/mt76/mt7996/init.c @@ -249,7 +249,7 @@ static int mt7996_thermal_init(struct mt7996_phy *phy) snprintf(cname, sizeof(cname), "cooling_device%d", phy->mt76->band_idx); - cdev = thermal_cooling_device_register(name, phy, &mt7996_thermal_ops); + cdev = thermal_cooling_device_register(&wiphy->dev, name, phy, &mt7996_thermal_ops); if (!IS_ERR(cdev)) { if (sysfs_create_link(&wiphy->dev.kobj, &cdev->device.kobj, cname) < 0) diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c index 5ce5ad3efe69..c74937d475e5 100644 --- a/drivers/platform/x86/acerhdf.c +++ b/drivers/platform/x86/acerhdf.c @@ -650,7 +650,7 @@ static int __init acerhdf_register_thermal(void) { int ret; - cl_dev = thermal_cooling_device_register("acerhdf-fan", NULL, + cl_dev = thermal_cooling_device_register(NULL, "acerhdf-fan", NULL, &acerhdf_cooling_ops); if (IS_ERR(cl_dev)) diff --git a/drivers/thermal/intel/int340x_thermal/int3403_thermal.c b/drivers/thermal/intel/int340x_thermal/int3403_thermal.c index 264c9bc8e645..08d9e91f01cb 100644 --- a/drivers/thermal/intel/int340x_thermal/int3403_thermal.c +++ b/drivers/thermal/intel/int340x_thermal/int3403_thermal.c @@ -178,8 +178,8 @@ static int int3403_cdev_add(struct int3403_priv *priv) priv->priv = obj; obj->max_state = p->package.count - 1; obj->cdev = - thermal_cooling_device_register(acpi_device_bid(priv->adev), - priv, &int3403_cooling_ops); + thermal_cooling_device_register(&priv->adev->dev, acpi_device_bid(priv->adev), + priv, &int3403_cooling_ops); if (IS_ERR(obj->cdev)) result = PTR_ERR(obj->cdev); diff --git a/drivers/thermal/intel/int340x_thermal/int3406_thermal.c b/drivers/thermal/intel/int340x_thermal/int3406_thermal.c index e21fcbccf4ba..e458add39a88 100644 --- a/drivers/thermal/intel/int340x_thermal/int3406_thermal.c +++ b/drivers/thermal/intel/int340x_thermal/int3406_thermal.c @@ -157,7 +157,7 @@ static int int3406_thermal_probe(struct platform_device *pdev) int3406_thermal_get_limit(d); - d->cooling_dev = thermal_cooling_device_register(acpi_device_bid(adev), + d->cooling_dev = thermal_cooling_device_register(&pdev->dev, acpi_device_bid(adev), d, &video_cooling_ops); if (IS_ERR(d->cooling_dev)) goto err; diff --git a/drivers/thermal/intel/intel_powerclamp.c b/drivers/thermal/intel/intel_powerclamp.c index 9a4cec000910..a8f798bf459f 100644 --- a/drivers/thermal/intel/intel_powerclamp.c +++ b/drivers/thermal/intel/intel_powerclamp.c @@ -779,7 +779,7 @@ static int __init powerclamp_init(void) /* set default limit, maybe adjusted during runtime based on feedback */ window_size = 2; - cooling_dev = thermal_cooling_device_register("intel_powerclamp", NULL, + cooling_dev = thermal_cooling_device_register(NULL, "intel_powerclamp", NULL, &powerclamp_cooling_ops); if (IS_ERR(cooling_dev)) return -ENODEV; diff --git a/drivers/thermal/intel/intel_tcc_cooling.c b/drivers/thermal/intel/intel_tcc_cooling.c index f352ecafbedf..a0ead0fb1fbe 100644 --- a/drivers/thermal/intel/intel_tcc_cooling.c +++ b/drivers/thermal/intel/intel_tcc_cooling.c @@ -101,7 +101,7 @@ static int __init tcc_cooling_init(void) pr_info("Programmable TCC Offset detected\n"); tcc_cdev = - thermal_cooling_device_register("TCC Offset", NULL, + thermal_cooling_device_register(NULL, "TCC Offset", NULL, &tcc_cooling_ops); if (IS_ERR(tcc_cdev)) { ret = PTR_ERR(tcc_cdev); diff --git a/drivers/thermal/pcie_cooling.c b/drivers/thermal/pcie_cooling.c index a876d64f1582..4d37f7f9d108 100644 --- a/drivers/thermal/pcie_cooling.c +++ b/drivers/thermal/pcie_cooling.c @@ -61,7 +61,7 @@ struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port if (!name) return ERR_PTR(-ENOMEM); - return thermal_cooling_device_register(name, port, &pcie_cooling_ops); + return thermal_cooling_device_register(&port->dev, name, port, &pcie_cooling_ops); } void pcie_cooling_device_unregister(struct thermal_cooling_device *cdev) diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 5d752e712cc0..92e51d2e4535 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1145,6 +1145,7 @@ __thermal_cooling_device_register(struct device *parent, struct device_node *np, /** * thermal_cooling_device_register() - register a new thermal cooling device + * @parent: parent device pointer. * @type: the thermal cooling device type. * @devdata: device private data. * @ops: standard thermal cooling devices callbacks. @@ -1157,10 +1158,10 @@ __thermal_cooling_device_register(struct device *parent, struct device_node *np, * ERR_PTR. Caller must check return value with IS_ERR*() helpers. */ struct thermal_cooling_device * -thermal_cooling_device_register(const char *type, void *devdata, +thermal_cooling_device_register(struct device *parent, const char *type, void *devdata, const struct thermal_cooling_device_ops *ops) { - return __thermal_cooling_device_register(NULL, NULL, type, devdata, ops); + return __thermal_cooling_device_register(parent, NULL, type, devdata, ops); } EXPORT_SYMBOL_GPL(thermal_cooling_device_register); diff --git a/include/linux/thermal.h b/include/linux/thermal.h index fa53d12173ce..29a608bf5f80 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -250,8 +250,9 @@ struct device *thermal_zone_device(struct thermal_zone_device *tzd); void thermal_zone_device_update(struct thermal_zone_device *, enum thermal_notify_event); -struct thermal_cooling_device *thermal_cooling_device_register(const char *, - void *, const struct thermal_cooling_device_ops *); +struct thermal_cooling_device * +thermal_cooling_device_register(struct device *parent, const char *type, void *drvdata, + const struct thermal_cooling_device_ops *ops); struct thermal_cooling_device * thermal_of_cooling_device_register(struct device *parent, struct device_node *np, const char *type, void *devdata, const struct thermal_cooling_device_ops *); @@ -298,8 +299,8 @@ static inline void thermal_zone_device_update(struct thermal_zone_device *tz, { } static inline struct thermal_cooling_device * -thermal_cooling_device_register(const char *type, void *devdata, - const struct thermal_cooling_device_ops *ops) +thermal_cooling_device_register(struct device *parent, const char *type, void *devdata, + const struct thermal_cooling_device_ops *ops) { return ERR_PTR(-ENODEV); } static inline struct thermal_cooling_device * thermal_of_cooling_device_register(struct device *parent, struct device_node *np, const char *type, -- 2.39.5 From W_Armin at gmx.de Wed Nov 19 19:41:18 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Thu, 20 Nov 2025 04:41:18 +0100 Subject: [PATCH RFC RESEND 8/8] thermal: core: Allow setting the parent device of thermal zone devices In-Reply-To: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> Message-ID: <20251120-thermal-device-v1-8-bbdad594d57a@gmx.de> Thermal zone devices currently have no parent device, potentially causing issues with suspend ordering and making it impossible for user space appications to associate a given thermal zone device with its parent device. Extend the functions used to register thermal zone devices to also accept a parent device pointer. Also update all users of those functions to provide a parent device pointer if available. Signed-off-by: Armin Wolf --- Documentation/driver-api/thermal/sysfs-api.rst | 5 ++- drivers/acpi/thermal.c | 16 +++++--- drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c | 4 +- drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 45 +++++++++++----------- drivers/net/wireless/intel/iwlwifi/mld/thermal.c | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/tt.c | 10 ++--- drivers/platform/x86/acerhdf.c | 2 +- drivers/power/supply/power_supply_core.c | 4 +- drivers/thermal/armada_thermal.c | 2 +- drivers/thermal/da9062-thermal.c | 2 +- drivers/thermal/dove_thermal.c | 2 +- drivers/thermal/imx_thermal.c | 2 +- .../intel/int340x_thermal/int3400_thermal.c | 2 +- .../intel/int340x_thermal/int340x_thermal_zone.c | 13 +++---- .../int340x_thermal/processor_thermal_device_pci.c | 7 ++-- drivers/thermal/intel/intel_pch_thermal.c | 2 +- drivers/thermal/intel/intel_quark_dts_thermal.c | 2 +- drivers/thermal/intel/intel_soc_dts_iosf.c | 2 +- drivers/thermal/intel/x86_pkg_temp_thermal.c | 6 +-- drivers/thermal/kirkwood_thermal.c | 2 +- drivers/thermal/renesas/rcar_thermal.c | 10 +++-- drivers/thermal/spear_thermal.c | 2 +- drivers/thermal/testing/zone.c | 2 +- drivers/thermal/thermal_core.c | 7 +++- drivers/thermal/thermal_of.c | 9 +++-- include/linux/thermal.h | 4 ++ 26 files changed, 92 insertions(+), 74 deletions(-) diff --git a/Documentation/driver-api/thermal/sysfs-api.rst b/Documentation/driver-api/thermal/sysfs-api.rst index cf242cd16f2e..0a29bc949ef3 100644 --- a/Documentation/driver-api/thermal/sysfs-api.rst +++ b/Documentation/driver-api/thermal/sysfs-api.rst @@ -37,7 +37,8 @@ temperature) and throttle appropriate devices. :: struct thermal_zone_device * - thermal_zone_device_register_with_trips(const char *type, + thermal_zone_device_register_with_trips(struct device *parent, + const char *type, const struct thermal_trip *trips, int num_trips, void *devdata, const struct thermal_zone_device_ops *ops, @@ -49,6 +50,8 @@ temperature) and throttle appropriate devices. /sys/class/thermal folder as `thermal_zone[0-*]`. It tries to bind all the thermal cooling devices registered to it at the same time. + parent: + parent device pointer. type: the thermal zone type. trips: diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c index 99ad67bbd764..483e28ce0d67 100644 --- a/drivers/acpi/thermal.c +++ b/drivers/acpi/thermal.c @@ -607,16 +607,20 @@ static int acpi_thermal_register_thermal_zone(struct acpi_thermal *tz, unsigned int trip_count, int passive_delay) { + unsigned int polling_delay = tz->polling_frequency * 100; int result; if (trip_count) - tz->thermal_zone = thermal_zone_device_register_with_trips( - "acpitz", trip_table, trip_count, tz, - &acpi_thermal_zone_ops, NULL, passive_delay, - tz->polling_frequency * 100); + tz->thermal_zone = thermal_zone_device_register_with_trips(&tz->device->dev, + "acpitz", trip_table, + trip_count, tz, + &acpi_thermal_zone_ops, + NULL, passive_delay, + polling_delay); else - tz->thermal_zone = thermal_tripless_zone_device_register( - "acpitz", tz, &acpi_thermal_zone_ops, NULL); + tz->thermal_zone = thermal_tripless_zone_device_register(&tz->device->dev, "acpitz", + tz, &acpi_thermal_zone_ops, + NULL); if (IS_ERR(tz->thermal_zone)) return PTR_ERR(tz->thermal_zone); diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c index 7bab8da8f6e6..05a1ec7df7a5 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c @@ -59,8 +59,8 @@ int cxgb4_thermal_init(struct adapter *adap) } snprintf(ch_tz_name, sizeof(ch_tz_name), "cxgb4_%s", adap->name); - ch_thermal->tzdev = thermal_zone_device_register_with_trips(ch_tz_name, &trip, num_trip, - adap, + ch_thermal->tzdev = thermal_zone_device_register_with_trips(adap->pdev_dev, ch_tz_name, + &trip, num_trip, adap, &cxgb4_thermal_ops, NULL, 0, 0); if (IS_ERR(ch_thermal->tzdev)) { diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c index 1117d59b74fd..a1b1e9e8dd3d 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c @@ -349,6 +349,8 @@ static const struct thermal_cooling_device_ops mlxsw_cooling_ops = { static int mlxsw_thermal_module_tz_init(struct mlxsw_thermal_module *module_tz) { + unsigned int polling_delay = module_tz->parent->polling_delay; + struct device *dev = module_tz->parent->bus_info->dev; char tz_name[40]; int err; @@ -358,14 +360,12 @@ mlxsw_thermal_module_tz_init(struct mlxsw_thermal_module *module_tz) else snprintf(tz_name, sizeof(tz_name), "mlxsw-module%d", module_tz->module + 1); - module_tz->tzdev = thermal_zone_device_register_with_trips(tz_name, - module_tz->trips, - MLXSW_THERMAL_NUM_TRIPS, - module_tz, - &mlxsw_thermal_module_ops, - &mlxsw_thermal_params, - 0, - module_tz->parent->polling_delay); + module_tz->tzdev = thermal_zone_device_register_with_trips(dev, tz_name, module_tz->trips, + MLXSW_THERMAL_NUM_TRIPS, + module_tz, + &mlxsw_thermal_module_ops, + &mlxsw_thermal_params, 0, + polling_delay); if (IS_ERR(module_tz->tzdev)) { err = PTR_ERR(module_tz->tzdev); return err; @@ -466,6 +466,8 @@ mlxsw_thermal_modules_fini(struct mlxsw_thermal *thermal, static int mlxsw_thermal_gearbox_tz_init(struct mlxsw_thermal_module *gearbox_tz) { + unsigned int polling_delay = gearbox_tz->parent->polling_delay; + struct device *dev = gearbox_tz->parent->bus_info->dev; char tz_name[40]; int ret; @@ -475,13 +477,13 @@ mlxsw_thermal_gearbox_tz_init(struct mlxsw_thermal_module *gearbox_tz) else snprintf(tz_name, sizeof(tz_name), "mlxsw-gearbox%d", gearbox_tz->module + 1); - gearbox_tz->tzdev = thermal_zone_device_register_with_trips(tz_name, - gearbox_tz->trips, - MLXSW_THERMAL_NUM_TRIPS, - gearbox_tz, - &mlxsw_thermal_gearbox_ops, - &mlxsw_thermal_params, 0, - gearbox_tz->parent->polling_delay); + gearbox_tz->tzdev = thermal_zone_device_register_with_trips(dev, tz_name, + gearbox_tz->trips, + MLXSW_THERMAL_NUM_TRIPS, + gearbox_tz, + &mlxsw_thermal_gearbox_ops, + &mlxsw_thermal_params, 0, + polling_delay); if (IS_ERR(gearbox_tz->tzdev)) return PTR_ERR(gearbox_tz->tzdev); @@ -709,13 +711,12 @@ int mlxsw_thermal_init(struct mlxsw_core *core, MLXSW_THERMAL_SLOW_POLL_INT : MLXSW_THERMAL_POLL_INT; - thermal->tzdev = thermal_zone_device_register_with_trips("mlxsw", - thermal->trips, - MLXSW_THERMAL_NUM_TRIPS, - thermal, - &mlxsw_thermal_ops, - &mlxsw_thermal_params, 0, - thermal->polling_delay); + thermal->tzdev = thermal_zone_device_register_with_trips(dev, "mlxsw", + thermal->trips, + MLXSW_THERMAL_NUM_TRIPS, + thermal, &mlxsw_thermal_ops, + &mlxsw_thermal_params, 0, + thermal->polling_delay); if (IS_ERR(thermal->tzdev)) { err = PTR_ERR(thermal->tzdev); dev_err(dev, "Failed to register thermal zone\n"); diff --git a/drivers/net/wireless/intel/iwlwifi/mld/thermal.c b/drivers/net/wireless/intel/iwlwifi/mld/thermal.c index 9e56e6e80ab7..56a0022d33db 100644 --- a/drivers/net/wireless/intel/iwlwifi/mld/thermal.c +++ b/drivers/net/wireless/intel/iwlwifi/mld/thermal.c @@ -256,7 +256,7 @@ static void iwl_mld_thermal_zone_register(struct iwl_mld *mld) sprintf(name, "iwlwifi_%u", atomic_inc_return(&counter) & 0xFF); mld->tzone = - thermal_zone_device_register_with_trips(name, trips, + thermal_zone_device_register_with_trips(mld->dev, name, trips, IWL_MAX_DTS_TRIPS, mld, &tzone_ops, NULL, 0, 0); diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c index b184f08230b9..e4777b815976 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c @@ -672,11 +672,11 @@ static void iwl_mvm_thermal_zone_register(struct iwl_mvm *mvm) mvm->tz_device.trips[i].type = THERMAL_TRIP_PASSIVE; mvm->tz_device.trips[i].flags = THERMAL_TRIP_FLAG_RW_TEMP; } - mvm->tz_device.tzone = thermal_zone_device_register_with_trips(name, - mvm->tz_device.trips, - IWL_MAX_DTS_TRIPS, - mvm, &tzone_ops, - NULL, 0, 0); + mvm->tz_device.tzone = thermal_zone_device_register_with_trips(mvm->dev, name, + mvm->tz_device.trips, + IWL_MAX_DTS_TRIPS, + mvm, &tzone_ops, + NULL, 0, 0); if (IS_ERR(mvm->tz_device.tzone)) { IWL_DEBUG_TEMP(mvm, "Failed to register to thermal zone (err = %ld)\n", diff --git a/drivers/platform/x86/acerhdf.c b/drivers/platform/x86/acerhdf.c index c74937d475e5..abdb5749c169 100644 --- a/drivers/platform/x86/acerhdf.c +++ b/drivers/platform/x86/acerhdf.c @@ -656,7 +656,7 @@ static int __init acerhdf_register_thermal(void) if (IS_ERR(cl_dev)) return -EINVAL; - thz_dev = thermal_zone_device_register_with_trips("acerhdf", trips, ARRAY_SIZE(trips), + thz_dev = thermal_zone_device_register_with_trips(NULL, "acerhdf", trips, ARRAY_SIZE(trips), NULL, &acerhdf_dev_ops, &acerhdf_zone_params, 0, (kernelmode) ? interval*1000 : 0); diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c index 9a28381e2607..cbc4bed17efa 100644 --- a/drivers/power/supply/power_supply_core.c +++ b/drivers/power/supply/power_supply_core.c @@ -1531,8 +1531,8 @@ static int psy_register_thermal(struct power_supply *psy) struct thermal_zone_params tzp = { .no_hwmon = IS_ENABLED(CONFIG_POWER_SUPPLY_HWMON) }; - psy->tzd = thermal_tripless_zone_device_register(psy->desc->name, - psy, &psy_tzd_ops, &tzp); + psy->tzd = thermal_tripless_zone_device_register(&psy->dev, psy->desc->name, psy, + &psy_tzd_ops, &tzp); if (IS_ERR(psy->tzd)) return PTR_ERR(psy->tzd); ret = thermal_zone_device_enable(psy->tzd); diff --git a/drivers/thermal/armada_thermal.c b/drivers/thermal/armada_thermal.c index c2fbdb534f61..fc60b0bab627 100644 --- a/drivers/thermal/armada_thermal.c +++ b/drivers/thermal/armada_thermal.c @@ -871,7 +871,7 @@ static int armada_thermal_probe(struct platform_device *pdev) /* Wait the sensors to be valid */ armada_wait_sensor_validity(priv); - tz = thermal_tripless_zone_device_register(priv->zone_name, + tz = thermal_tripless_zone_device_register(&pdev->dev, priv->zone_name, priv, &legacy_ops, NULL); if (IS_ERR(tz)) { diff --git a/drivers/thermal/da9062-thermal.c b/drivers/thermal/da9062-thermal.c index a8d4b766ba21..c5368c2b53b9 100644 --- a/drivers/thermal/da9062-thermal.c +++ b/drivers/thermal/da9062-thermal.c @@ -196,7 +196,7 @@ static int da9062_thermal_probe(struct platform_device *pdev) INIT_DELAYED_WORK(&thermal->work, da9062_thermal_poll_on); mutex_init(&thermal->lock); - thermal->zone = thermal_zone_device_register_with_trips(thermal->config->name, + thermal->zone = thermal_zone_device_register_with_trips(&pdev->dev, thermal->config->name, trips, ARRAY_SIZE(trips), thermal, &da9062_thermal_ops, NULL, pp_tmp, 0); diff --git a/drivers/thermal/dove_thermal.c b/drivers/thermal/dove_thermal.c index 723bc72f0626..101c6109b04a 100644 --- a/drivers/thermal/dove_thermal.c +++ b/drivers/thermal/dove_thermal.c @@ -139,7 +139,7 @@ static int dove_thermal_probe(struct platform_device *pdev) return ret; } - thermal = thermal_tripless_zone_device_register("dove_thermal", priv, + thermal = thermal_tripless_zone_device_register(&pdev->dev, "dove_thermal", priv, &ops, NULL); if (IS_ERR(thermal)) { dev_err(&pdev->dev, diff --git a/drivers/thermal/imx_thermal.c b/drivers/thermal/imx_thermal.c index 38c993d1bcb3..043e80756017 100644 --- a/drivers/thermal/imx_thermal.c +++ b/drivers/thermal/imx_thermal.c @@ -679,7 +679,7 @@ static int imx_thermal_probe(struct platform_device *pdev) goto legacy_cleanup; } - data->tz = thermal_zone_device_register_with_trips("imx_thermal_zone", + data->tz = thermal_zone_device_register_with_trips(dev, "imx_thermal_zone", trips, ARRAY_SIZE(trips), data, diff --git a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c index 41d3bc3ed8a2..ed21da8f0a47 100644 --- a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c +++ b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c @@ -594,7 +594,7 @@ static int int3400_thermal_probe(struct platform_device *pdev) evaluate_odvp(priv); - priv->thermal = thermal_tripless_zone_device_register("INT3400 Thermal", priv, + priv->thermal = thermal_tripless_zone_device_register(&pdev->dev, "INT3400 Thermal", priv, &int3400_thermal_ops, &int3400_thermal_params); if (IS_ERR(priv->thermal)) { diff --git a/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c b/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c index 3d9efe69d562..3adccb7fc157 100644 --- a/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c +++ b/drivers/thermal/intel/int340x_thermal/int340x_thermal_zone.c @@ -160,13 +160,12 @@ struct int34x_thermal_zone *int340x_thermal_zone_add(struct acpi_device *adev, int34x_zone->lpat_table = acpi_lpat_get_conversion_table(adev->handle); - int34x_zone->zone = thermal_zone_device_register_with_trips( - acpi_device_bid(adev), - zone_trips, trip_cnt, - int34x_zone, - &zone_ops, - &int340x_thermal_params, - 0, 0); + int34x_zone->zone = thermal_zone_device_register_with_trips(&adev->dev, + acpi_device_bid(adev), + zone_trips, trip_cnt, + int34x_zone, &zone_ops, + &int340x_thermal_params, + 0, 0); kfree(zone_trips); if (IS_ERR(int34x_zone->zone)) { diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c index 0d4dcc66e097..2b3116e23fa1 100644 --- a/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c +++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c @@ -371,10 +371,9 @@ static int proc_thermal_pci_probe(struct pci_dev *pdev, const struct pci_device_ psv_trip.temperature = get_trip_temp(pci_info); - pci_info->tzone = thermal_zone_device_register_with_trips("TCPU_PCI", &psv_trip, - 1, pci_info, - &tzone_ops, - &tzone_params, 0, 0); + pci_info->tzone = thermal_zone_device_register_with_trips(&pdev->dev, "TCPU_PCI", &psv_trip, + 1, pci_info, &tzone_ops, + &tzone_params, 0, 0); if (IS_ERR(pci_info->tzone)) { ret = PTR_ERR(pci_info->tzone); goto err_del_legacy; diff --git a/drivers/thermal/intel/intel_pch_thermal.c b/drivers/thermal/intel/intel_pch_thermal.c index fc326985796c..754527b2b09a 100644 --- a/drivers/thermal/intel/intel_pch_thermal.c +++ b/drivers/thermal/intel/intel_pch_thermal.c @@ -235,7 +235,7 @@ static int intel_pch_thermal_probe(struct pci_dev *pdev, nr_trips += pch_wpt_add_acpi_psv_trip(ptd, &ptd_trips[nr_trips]); - ptd->tzd = thermal_zone_device_register_with_trips(board_names[board_id], + ptd->tzd = thermal_zone_device_register_with_trips(&pdev->dev, board_names[board_id], ptd_trips, nr_trips, ptd, &tzd_ops, NULL, 0, 0); diff --git a/drivers/thermal/intel/intel_quark_dts_thermal.c b/drivers/thermal/intel/intel_quark_dts_thermal.c index 89498eb29a89..d8d38b6ed452 100644 --- a/drivers/thermal/intel/intel_quark_dts_thermal.c +++ b/drivers/thermal/intel/intel_quark_dts_thermal.c @@ -376,7 +376,7 @@ static struct soc_sensor_entry *alloc_soc_dts(void) trips[QRK_DTS_ID_TP_HOT].temperature = get_trip_temp(QRK_DTS_ID_TP_HOT); trips[QRK_DTS_ID_TP_HOT].type = THERMAL_TRIP_HOT; - aux_entry->tzone = thermal_zone_device_register_with_trips("quark_dts", + aux_entry->tzone = thermal_zone_device_register_with_trips(NULL, "quark_dts", trips, QRK_MAX_DTS_TRIPS, aux_entry, diff --git a/drivers/thermal/intel/intel_soc_dts_iosf.c b/drivers/thermal/intel/intel_soc_dts_iosf.c index ea87439fe7a9..74638dac75e6 100644 --- a/drivers/thermal/intel/intel_soc_dts_iosf.c +++ b/drivers/thermal/intel/intel_soc_dts_iosf.c @@ -230,7 +230,7 @@ static int add_dts_thermal_zone(int id, struct intel_soc_dts_sensor_entry *dts, } } snprintf(name, sizeof(name), "soc_dts%d", id); - dts->tzone = thermal_zone_device_register_with_trips(name, trips, + dts->tzone = thermal_zone_device_register_with_trips(NULL, name, trips, SOC_MAX_DTS_TRIPS, dts, &tzone_ops, NULL, 0, 0); diff --git a/drivers/thermal/intel/x86_pkg_temp_thermal.c b/drivers/thermal/intel/x86_pkg_temp_thermal.c index 3fc679b6f11b..807126dc4bea 100644 --- a/drivers/thermal/intel/x86_pkg_temp_thermal.c +++ b/drivers/thermal/intel/x86_pkg_temp_thermal.c @@ -342,9 +342,9 @@ static int pkg_temp_thermal_device_add(unsigned int cpu) INIT_DELAYED_WORK(&zonedev->work, pkg_temp_thermal_threshold_work_fn); zonedev->cpu = cpu; - zonedev->tzone = thermal_zone_device_register_with_trips("x86_pkg_temp", - trips, thres_count, - zonedev, &tzone_ops, &pkg_temp_tz_params, 0, 0); + zonedev->tzone = thermal_zone_device_register_with_trips(NULL, "x86_pkg_temp", trips, + thres_count, zonedev, &tzone_ops, + &pkg_temp_tz_params, 0, 0); if (IS_ERR(zonedev->tzone)) { err = PTR_ERR(zonedev->tzone); goto out_kfree_zonedev; diff --git a/drivers/thermal/kirkwood_thermal.c b/drivers/thermal/kirkwood_thermal.c index 4619e090f756..4827ad2bdb49 100644 --- a/drivers/thermal/kirkwood_thermal.c +++ b/drivers/thermal/kirkwood_thermal.c @@ -71,7 +71,7 @@ static int kirkwood_thermal_probe(struct platform_device *pdev) if (IS_ERR(priv->sensor)) return PTR_ERR(priv->sensor); - thermal = thermal_tripless_zone_device_register("kirkwood_thermal", + thermal = thermal_tripless_zone_device_register(&pdev->dev, "kirkwood_thermal", priv, &ops, NULL); if (IS_ERR(thermal)) { dev_err(&pdev->dev, diff --git a/drivers/thermal/renesas/rcar_thermal.c b/drivers/thermal/renesas/rcar_thermal.c index fdd7afdc4ff6..3d228e4c7b09 100644 --- a/drivers/thermal/renesas/rcar_thermal.c +++ b/drivers/thermal/renesas/rcar_thermal.c @@ -488,10 +488,12 @@ static int rcar_thermal_probe(struct platform_device *pdev) dev, i, priv, &rcar_thermal_zone_ops); } else { - priv->zone = thermal_zone_device_register_with_trips( - "rcar_thermal", trips, ARRAY_SIZE(trips), priv, - &rcar_thermal_zone_ops, NULL, 0, - idle); + priv->zone = thermal_zone_device_register_with_trips(dev, "rcar_thermal", + trips, + ARRAY_SIZE(trips), + priv, + &rcar_thermal_zone_ops, + NULL, 0, idle); ret = thermal_zone_device_enable(priv->zone); if (ret) { diff --git a/drivers/thermal/spear_thermal.c b/drivers/thermal/spear_thermal.c index 603dadcd3df5..c5bba9d600d4 100644 --- a/drivers/thermal/spear_thermal.c +++ b/drivers/thermal/spear_thermal.c @@ -122,7 +122,7 @@ static int spear_thermal_probe(struct platform_device *pdev) stdev->flags = val; writel_relaxed(stdev->flags, stdev->thermal_base); - spear_thermal = thermal_tripless_zone_device_register("spear_thermal", + spear_thermal = thermal_tripless_zone_device_register(&pdev->dev, "spear_thermal", stdev, &ops, NULL); if (IS_ERR(spear_thermal)) { dev_err(&pdev->dev, "thermal zone device is NULL\n"); diff --git a/drivers/thermal/testing/zone.c b/drivers/thermal/testing/zone.c index c12c405225bb..5a7e9969582e 100644 --- a/drivers/thermal/testing/zone.c +++ b/drivers/thermal/testing/zone.c @@ -402,7 +402,7 @@ static int tt_zone_register_tz(struct tt_thermal_zone *tt_zone) tt_zone->tz_temp = tt_zone->temp; - tz = thermal_zone_device_register_with_trips("test_tz", trips, i, tt_zone, + tz = thermal_zone_device_register_with_trips(NULL, "test_tz", trips, i, tt_zone, &tt_zone_ops, NULL, 0, 0); if (IS_ERR(tz)) return PTR_ERR(tz); diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 92e51d2e4535..9d8499999579 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1475,6 +1475,7 @@ static void thermal_zone_init_complete(struct thermal_zone_device *tz) /** * thermal_zone_device_register_with_trips() - register a new thermal zone device + * @parent: parent device pointer * @type: the thermal zone device type * @trips: a pointer to an array of thermal trips * @num_trips: the number of trip points the thermal zone support @@ -1498,7 +1499,7 @@ static void thermal_zone_init_complete(struct thermal_zone_device *tz) * IS_ERR*() helpers. */ struct thermal_zone_device * -thermal_zone_device_register_with_trips(const char *type, +thermal_zone_device_register_with_trips(struct device *parent, const char *type, const struct thermal_trip *trips, int num_trips, void *devdata, const struct thermal_zone_device_ops *ops, @@ -1576,6 +1577,7 @@ thermal_zone_device_register_with_trips(const char *type, tz->ops.critical = thermal_zone_device_critical; tz->device.class = thermal_class; + tz->device.parent = parent; tz->devdata = devdata; tz->num_trips = num_trips; for_each_trip_desc(tz, td) { @@ -1651,12 +1653,13 @@ thermal_zone_device_register_with_trips(const char *type, EXPORT_SYMBOL_GPL(thermal_zone_device_register_with_trips); struct thermal_zone_device *thermal_tripless_zone_device_register( + struct device *parent, const char *type, void *devdata, const struct thermal_zone_device_ops *ops, const struct thermal_zone_params *tzp) { - return thermal_zone_device_register_with_trips(type, NULL, 0, devdata, + return thermal_zone_device_register_with_trips(parent, type, NULL, 0, devdata, ops, tzp, 0, 0); } EXPORT_SYMBOL_GPL(thermal_tripless_zone_device_register); diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c index 1a51a4d240ff..e3359ca20d77 100644 --- a/drivers/thermal/thermal_of.c +++ b/drivers/thermal/thermal_of.c @@ -354,6 +354,7 @@ static void thermal_of_zone_unregister(struct thermal_zone_device *tz) * zone properties and registers new thermal zone with those * properties. * + * @parent: parent device pointer * @sensor: A device node pointer corresponding to the sensor in the device tree * @id: An integer as sensor identifier * @data: A private data to be stored in the thermal zone dedicated private area @@ -364,7 +365,9 @@ static void thermal_of_zone_unregister(struct thermal_zone_device *tz) * - ENOMEM: if one structure can not be allocated * - Other negative errors are returned by the underlying called functions */ -static struct thermal_zone_device *thermal_of_zone_register(struct device_node *sensor, int id, void *data, +static struct thermal_zone_device *thermal_of_zone_register(struct device *parent, + struct device_node *sensor, + int id, void *data, const struct thermal_zone_device_ops *ops) { struct thermal_zone_device_ops of_ops = *ops; @@ -412,7 +415,7 @@ static struct thermal_zone_device *thermal_of_zone_register(struct device_node * of_ops.critical = thermal_zone_device_critical_shutdown; } - tz = thermal_zone_device_register_with_trips(np->name, trips, ntrips, + tz = thermal_zone_device_register_with_trips(parent, np->name, trips, ntrips, data, &of_ops, &tzp, pdelay, delay); if (IS_ERR(tz)) { @@ -478,7 +481,7 @@ struct thermal_zone_device *devm_thermal_of_zone_register(struct device *dev, in if (!ptr) return ERR_PTR(-ENOMEM); - tzd = thermal_of_zone_register(dev->of_node, sensor_id, data, ops); + tzd = thermal_of_zone_register(dev, dev->of_node, sensor_id, data, ops); if (IS_ERR(tzd)) { devres_free(ptr); return tzd; diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 29a608bf5f80..0c5a91313bd5 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -226,6 +226,7 @@ int thermal_zone_get_crit_temp(struct thermal_zone_device *tz, int *temp); #ifdef CONFIG_THERMAL struct thermal_zone_device *thermal_zone_device_register_with_trips( + struct device *parent, const char *type, const struct thermal_trip *trips, int num_trips, void *devdata, @@ -235,6 +236,7 @@ struct thermal_zone_device *thermal_zone_device_register_with_trips( unsigned int polling_delay); struct thermal_zone_device *thermal_tripless_zone_device_register( + struct device *parent, const char *type, void *devdata, const struct thermal_zone_device_ops *ops, @@ -276,6 +278,7 @@ int thermal_zone_device_disable(struct thermal_zone_device *tz); void thermal_zone_device_critical(struct thermal_zone_device *tz); #else static inline struct thermal_zone_device *thermal_zone_device_register_with_trips( + struct device *parent, const char *type, const struct thermal_trip *trips, int num_trips, void *devdata, @@ -285,6 +288,7 @@ static inline struct thermal_zone_device *thermal_zone_device_register_with_trip { return ERR_PTR(-ENODEV); } static inline struct thermal_zone_device *thermal_tripless_zone_device_register( + struct device *parent, const char *type, void *devdata, struct thermal_zone_device_ops *ops, -- 2.39.5 From W_Armin at gmx.de Wed Nov 19 19:41:17 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Thu, 20 Nov 2025 04:41:17 +0100 Subject: [PATCH RFC RESEND 7/8] ACPI: thermal: Stop creating "device" sysfs link In-Reply-To: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> Message-ID: <20251120-thermal-device-v1-7-bbdad594d57a@gmx.de> The thermal core will soon automatically create sysfs links between the thermal zone device and its parent device. Stop manually creating the "device" sysfs link between the thermal zone device and the parent device to avoid a name collision. The "thermal_zone" sysfs link however stays for backwards compatibility, as it does not suffer from a name collision. Signed-off-by: Armin Wolf --- drivers/acpi/thermal.c | 17 ++--------------- 1 file changed, 2 insertions(+), 15 deletions(-) diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c index a511f9ea0267..99ad67bbd764 100644 --- a/drivers/acpi/thermal.c +++ b/drivers/acpi/thermal.c @@ -592,27 +592,14 @@ static const struct thermal_zone_device_ops acpi_thermal_zone_ops = { static int acpi_thermal_zone_sysfs_add(struct acpi_thermal *tz) { struct device *tzdev = thermal_zone_device(tz->thermal_zone); - int ret; - ret = sysfs_create_link(&tz->device->dev.kobj, - &tzdev->kobj, "thermal_zone"); - if (ret) - return ret; - - ret = sysfs_create_link(&tzdev->kobj, - &tz->device->dev.kobj, "device"); - if (ret) - sysfs_remove_link(&tz->device->dev.kobj, "thermal_zone"); - - return ret; + /* For backwards compatibility */ + return sysfs_create_link(&tz->device->dev.kobj, &tzdev->kobj, "thermal_zone"); } static void acpi_thermal_zone_sysfs_remove(struct acpi_thermal *tz) { - struct device *tzdev = thermal_zone_device(tz->thermal_zone); - sysfs_remove_link(&tz->device->dev.kobj, "thermal_zone"); - sysfs_remove_link(&tzdev->kobj, "device"); } static int acpi_thermal_register_thermal_zone(struct acpi_thermal *tz, -- 2.39.5 From frut3k7 at gmail.com Fri Nov 21 05:28:25 2025 From: frut3k7 at gmail.com (=?UTF-8?q?Pawe=C5=82=20Owoc?=) Date: Fri, 21 Nov 2025 14:28:25 +0100 Subject: New staging repos for ath1*k firmware In-Reply-To: Message-ID: <20251121132825.1663248-1-frut3k7@gmail.com> Hi, is it possible to release the new firmware version (2.12) for the IPQ8074? There's a problem with the latest ath11k driver on version 2.9: https://lore.kernel.org/linux-wireless/CAKEyCaD8RMqPvwZOxgwAT4G=h-M94ToxoSdYwCjfvZMiM8mB-g at mail.gmail.com/ Regards, From rafael at kernel.org Fri Nov 21 12:35:47 2025 From: rafael at kernel.org (Rafael J. Wysocki) Date: Fri, 21 Nov 2025 21:35:47 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> Message-ID: On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: > > Drivers registering thermal zone/cooling devices are currently unable > to tell the thermal core what parent device the new thermal zone/ > cooling device should have, potentially causing issues with suspend > ordering This is one potential class of problems that may arise, but I would like to see a real example of this. As it stands today, thermal_class has no PM callbacks, so there are no callback execution ordering issues with devices in that class and what other suspend/resume ordering issues are there? Also, the suspend and resume of thermal zones is handled via PM notifiers. Is there a problem with this? > and making it impossible for user space applications to > associate a given thermal zone device with its parent device. Why does user space need to know the parent of a given cooling device or thermal zone? > This patch series aims to fix this issue by extending the functions > used to register thermal zone/cooling devices to also accept a parent > device pointer. The first six patches convert all functions used for > registering cooling devices, while the functions used for registering > thermal zone devices are converted by the remaining two patches. > > I tested this series on various devices containing (among others): > - ACPI thermal zones > - ACPI processor devices > - PCIe cooling devices > - Intel Wifi card > - Intel powerclamp > - Intel TCC cooling What exactly did you do to test it? > I also compile-tested the remaining affected drivers, however i would > still be happy if the relevant maintainers (especially those of the > mellanox ethernet switch driver) could take a quick glance at the > code and verify that i am using the correct device as the parent > device. I think that the above paragraph is not relevant any more? > This work is also necessary for extending the ACPI thermal zone driver > to support the _TZD ACPI object in the future. I'm still unsure why _TZD support requires the ability to set a thermal zone parent device. > Signed-off-by: Armin Wolf > --- > Armin Wolf (8): > thermal: core: Allow setting the parent device of cooling devices > thermal: core: Set parent device in thermal_of_cooling_device_register() > ACPI: processor: Stop creating "device" sysfs link That link is not to the cooling devices' parent, but to the ACPI device object (a struct acpi_device) that corresponds to the parent. The parent of the cooling device should be the processor device, not its ACPI companion, so I'm not sure why there would be a conflict. > ACPI: fan: Stop creating "device" sysfs link > ACPI: video: Stop creating "device" sysfs link Analogously in the above two cases AFAICS. The parent of a cooling device should be a "physical" device object, like a platform device or a PCI device or similar, not a struct acpi_device (which in fact is not a device even). > thermal: core: Set parent device in thermal_cooling_device_register() > ACPI: thermal: Stop creating "device" sysfs link And this link is to the struct acpi_device representing the thermal zone itself. > thermal: core: Allow setting the parent device of thermal zone devices I'm not sure if this is a good idea, at least until it is clear what the role of a thermal zone parent device should be. From W_Armin at gmx.de Sat Nov 22 06:18:11 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Sat, 22 Nov 2025 15:18:11 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> Message-ID: <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: > On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: >> Drivers registering thermal zone/cooling devices are currently unable >> to tell the thermal core what parent device the new thermal zone/ >> cooling device should have, potentially causing issues with suspend >> ordering > This is one potential class of problems that may arise, but I would > like to see a real example of this. > > As it stands today, thermal_class has no PM callbacks, so there are no > callback execution ordering issues with devices in that class and what > other suspend/resume ordering issues are there? Correct, that is why i said "potentially". > > Also, the suspend and resume of thermal zones is handled via PM > notifiers. Is there a problem with this? The problem with PM notifiers is that thermal zones stop working even before user space is frozen. Freezing user space might take a lot of time, so having no thermal management during this period is less than ideal. This problem would not occur when using dev_pm_ops, as thermal zones would be suspended after user space has been frozen successfully. Additionally, when using dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates that no new devices (including thermal zones and cooling devices) be registered during a suspend/resume cycle. Replacing the PM notifiers with dev_pm_ops would of course be a optimization with its own patch series. >> and making it impossible for user space applications to >> associate a given thermal zone device with its parent device. > Why does user space need to know the parent of a given cooling device > or thermal zone? Lets say that we have two thermal zones registered by two instances of the Intel Wifi driver. User space is currently unable to find out which thermal zone belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]"). This problem would be solved once we populate the parent device pointer inside the thermal zone device, as user space can simply look at the "device" symlink to determine the parent device behind a given thermal zone device. Additionally, being able to access the acpi_handle of the parent device will be necessary for the ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors. >> This patch series aims to fix this issue by extending the functions >> used to register thermal zone/cooling devices to also accept a parent >> device pointer. The first six patches convert all functions used for >> registering cooling devices, while the functions used for registering >> thermal zone devices are converted by the remaining two patches. >> >> I tested this series on various devices containing (among others): >> - ACPI thermal zones >> - ACPI processor devices >> - PCIe cooling devices >> - Intel Wifi card >> - Intel powerclamp >> - Intel TCC cooling > What exactly did you do to test it? I tested: - the thermal zone temperature readout - correctness of the new sysfs links - suspend/resume I also verified that ACPI thermal zones still bind with the ACPI fans. >> I also compile-tested the remaining affected drivers, however i would >> still be happy if the relevant maintainers (especially those of the >> mellanox ethernet switch driver) could take a quick glance at the >> code and verify that i am using the correct device as the parent >> device. > I think that the above paragraph is not relevant any more? You are right, however i originally meant to CC the mellanox maintainers as i was a bit unsure about the changes i made to their driver. I will rework this section in the next revision and CC the mellanox maintainers. > >> This work is also necessary for extending the ACPI thermal zone driver >> to support the _TZD ACPI object in the future. > I'm still unsure why _TZD support requires the ability to set a > thermal zone parent device. _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans and ACPI processors, like ACPI batteries. This however will currently not work as the ACPI thermal zone driver uses the private drvdata of the cooling device to determine if said cooling device should bind. This only works for ACPI fans and processors due to the fact that those drivers store a ACPI device pointer inside drvdata, something the ACPI thermal zone expects. As we cannot require all cooling devices to store an ACPI device pointer inside their drvdata field in order to support ACPI, we must use a more generic approach. I was thinking about using the acpi_handle of the parent device instead of messing with the drvdata field, but this only works if the parent device pointer of the cooling device is populated. (Cooling devices without a parent device would then be ignored by the ACPI thermal zone driver, as such cooling devices cannot be linked to ACPI). > >> Signed-off-by: Armin Wolf >> --- >> Armin Wolf (8): >> thermal: core: Allow setting the parent device of cooling devices >> thermal: core: Set parent device in thermal_of_cooling_device_register() >> ACPI: processor: Stop creating "device" sysfs link > That link is not to the cooling devices' parent, but to the ACPI > device object (a struct acpi_device) that corresponds to the parent. > The parent of the cooling device should be the processor device, not > its ACPI companion, so I'm not sure why there would be a conflict. From the perspective of the Linux device core, a parent device does not have to be a "physical" device. In the case of the ACPI processor driver, the ACPI device is used, so the cooling device registered by said driver belongs to the ACPI device. I agree that using the Linux processor device would make more sense, but this will require changes inside the ACPI processor driver. As for the "device" symlink: The conflict would be a naming conflict, as both "device" symlinks (the one created by the ACPI processor driver and the one created by the device core) will be created in the same directory (which is the directory of the cooling device). >> ACPI: fan: Stop creating "device" sysfs link >> ACPI: video: Stop creating "device" sysfs link > Analogously in the above two cases AFAICS. > > The parent of a cooling device should be a "physical" device object, > like a platform device or a PCI device or similar, not a struct > acpi_device (which in fact is not a device even). From the perspective of the Linux device core, a ACPI device is a perfectly valid device. I agree that using a platform device or PCI device is better, but this already happens inside the ACPI fan driver (platform device). Only the ACPI video driver created a "device" sysfs link that points to the ACPI device instead of the PCI device. I just noticed that i accidentally changed this by using the PCI device as the parent device for the cooling device. If you want then we can keep this change. >> thermal: core: Set parent device in thermal_cooling_device_register() >> ACPI: thermal: Stop creating "device" sysfs link > And this link is to the struct acpi_device representing the thermal zone itself. Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to ACPI devices. Because of this all (thermal zone) devices created by an instance of said driver are descendants of the ACPI device said instance is bound to. We can of course convert the ACPI thermal zone driver into a platform driver, but this would be a separate patch series. >> thermal: core: Allow setting the parent device of thermal zone devices > I'm not sure if this is a good idea, at least until it is clear what > the role of a thermal zone parent device should be. Take a look at my explanation with the Intel Wifi driver. Thanks, Armin Wolf From sajattack at postmarketos.org Mon Nov 24 22:59:04 2025 From: sajattack at postmarketos.org (Paul Sajna) Date: Mon, 24 Nov 2025 22:59:04 -0800 Subject: [PATCH] ath10k: Introduce a devicetree quirk to skip host cap QMI requests In-Reply-To: <1601058581-19461-1-git-send-email-amit.pundir@linaro.org> References: <1601058581-19461-1-git-send-email-amit.pundir@linaro.org> Message-ID: +1 on this from me. I need it for my patch-series for LG G7 ThinQ (judyln) as well. https://lore.kernel.org/all/20250928-judyln-dts-v3-0-b14cf9e9a928 at postmarketos.org/T/#m90e8087d4388e588b71a0eff01b88f1721f73b73 From sajattack at postmarketos.org Mon Nov 24 23:03:02 2025 From: sajattack at postmarketos.org (Paul Sajna) Date: Mon, 24 Nov 2025 23:03:02 -0800 Subject: [PATCH v2 0/3] ath10k: Introduce a devicetree quirk to skip host cap QMI requests In-Reply-To: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> References: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> Message-ID: <33a8c6be-6946-41e3-aad7-fcd572e32a66@postmarketos.org> Also required for https://lore.kernel.org/all/20250928-judyln-dts-v3-0-b14cf9e9a928 at postmarketos.org/T/#m90e8087d4388e588b71a0eff01b88f1721f73b73 From david at ixit.cz Tue Nov 25 01:27:33 2025 From: david at ixit.cz (David Heidelberg) Date: Tue, 25 Nov 2025 10:27:33 +0100 Subject: [PATCH 1/2] ath10k: Introduce a firmware quirk to skip host cap QMI requests In-Reply-To: <20251111-xiaomi-beryllium-firmware-v1-1-836b9c51ad86@ixit.cz> References: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> <20251111-xiaomi-beryllium-firmware-v1-1-836b9c51ad86@ixit.cz> Message-ID: <313b36d3-e1b4-4e80-8d5d-d65981abb34b@ixit.cz> Sadly, this is too early in the initialization process and we get NULL deref, similar to [1]. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=000000010f838000 [0000000000000058] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP Modules linked in: qrtr_smd fastrpc rpmsg_ctrl des_generic algif_skcipher md5 md4 algif_hash snd_soc_sdm845 snd_soc_rt5663 snd_soc_qcom_sdw snd_soc_qcom_common snd_soc_rl6231 hci_uart snd_soc_core nft_reject_inet nf_reject_ipv4 btqca nf_reject_ipv6 nft_reject btbcm snd_compress nft_ct bluetooth nf_conntrack nxp_nci_i2c snd_pcm nxp_nci nf_defrag_ipv6 ecdh_generic nf_defrag_ipv4 nci snd_timer ecc soundwire_bus nfc pwrseq_core rmi_i2c snd nf_tables qcom_camss venus_core qcom_spmi_haptics soundcore rmi_core leds_qcom_flash videobuf2_dma_sg qcom_spmi_rradc ath10k_snoc bq27xxx_battery_i2c videobuf2_memops v4l2_mem2mem qcom_smbx bq27xxx_battery rtc_pm8xxx v4l2_fwnode videobuf2_v4l2 ath10k_core videobuf2_common v4l2_async ath qcom_refgen_regulator qcom_stats videodev reset_qcom_pdc mac80211 mc camcc_sdm845 i2c_qcom_cci coresight_tmc qcom_rng coresight_stm stm_core coresight_replicator coresight_funnel qcom_q6v5_mss coresight cfg80211 qrtr ipa qcom_q6v5_pas slim_qcom_ngd_ctrl rfkill qcom_pil_info qcom_wdt qcom_q6v5 qcom_sysmon qcom_common qcom_glink_smem icc_bwmon uhid uinput zram zsmalloc fuse nfnetlink ipv6 CPU: 4 UID: 0 PID: 154 Comm: kworker/u32:7 Tainted: G W 6.18.0-rc5-next-20251111-sdm845-00134-gfb2106976a5c-dirty #2 PREEMPT Tainted: [W]=WARN Hardware name: OnePlus 6T (DT) Workqueue: ath10k_qmi_driver_event ath10k_qmi_driver_event_work [ath10k_snoc] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ath10k_qmi_driver_event_work+0x1ec/0x440 [ath10k_snoc] lr : ath10k_qmi_driver_event_work+0x1dc/0x440 [ath10k_snoc] sp : ffff8000819b3cf0 x29: ffff8000819b3d40 x28: ffff00008d823c00 x27: dead000000000122 x26: 0000000000000000 x25: ffff00008fab2060 x24: dead000000000100 x23: ffff00008d823d50 x22: ffff00008d81bd28 x21: ffff00008d823d28 x20: ffff00008d823d28 x19: ffff0000901c5120 x18: ffff56858e1da000 x17: ffff56858e1da000 x16: ffffa97c6467f1b8 x15: ffffa97c6569dbd0 x14: ffffa97c655a1440 x13: 0000000000000000 x12: ffff00008a12e4a8 x11: ffff00008d823cd8 x10: ffff00008a12e480 x9 : ffffa97c640314c4 x8 : ffff00008d823cd8 x7 : 0000000000000000 x6 : ffff00008a12e6a8 x5 : fffffffffffffffe x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: ath10k_qmi_driver_event_work+0x1ec/0x440 [ath10k_snoc] (P) process_one_work+0x15c/0x3c0 worker_thread+0x2d0/0x400 kthread+0x148/0x208 ret_from_fork+0x10/0x20 Code: 350001a0 39488380 37000de0 f9487b20 (f9402c00) ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ If no objection raised, I would go back to the original device-tree property way then (as also another device in need of this quirk showed up). David [1] https://lore.kernel.org/ath10k/54ac2295-36b4-49fc-9583-a10db8d9d5d6 at freebox.fr/ On 11/11/2025 13:34, David Heidelberg via B4 Relay wrote: > From: David Heidelberg > > There are firmware versions which do not support host capability > QMI request. We suspect either the host cap is not implemented or > there may be firmware specific issues, but apparently there seem > to be a generation of firmware that has this particular behavior. > > For example, firmware build on Xiaomi Poco F1 (sdm845) phone: > "QC_IMAGE_VERSION_STRING=WLAN.HL.2.0.c3-00257-QCAHLSWMTPLZ-1" > > If we do not skip the host cap QMI request on Xiaomi Poco F1, > then we get a QMI_ERR_MALFORMED_MSG_V01 error message in the > ath10k_qmi_host_cap_send_sync(). But this error message is not > fatal to the firmware nor to the ath10k driver and we can still > bring up the WiFi services successfully if we just ignore it. > > Hence introducing this firmware quirk to skip host capability > QMI request for the firmware versions which do not support this > feature. > > Suggested-by: Dmitry Baryshkov > Signed-off-by: David Heidelberg > --- > drivers/net/wireless/ath/ath10k/core.c | 1 + > drivers/net/wireless/ath/ath10k/core.h | 3 +++ > drivers/net/wireless/ath/ath10k/qmi.c | 13 ++++++++++--- > 3 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c > index 7c2939cbde5f0..7602631696798 100644 > --- a/drivers/net/wireless/ath/ath10k/core.c > +++ b/drivers/net/wireless/ath/ath10k/core.c > @@ -773,6 +773,7 @@ static const char *const ath10k_core_fw_feature_str[] = { > [ATH10K_FW_FEATURE_SINGLE_CHAN_INFO_PER_CHANNEL] = "single-chan-info-per-channel", > [ATH10K_FW_FEATURE_PEER_FIXED_RATE] = "peer-fixed-rate", > [ATH10K_FW_FEATURE_IRAM_RECOVERY] = "iram-recovery", > + [ATH10K_FW_FEATURE_NO_HOST_CAP_QMI_REQ] = "no-host-cap-qmi-req", > }; > > static unsigned int ath10k_core_get_fw_feature_str(char *buf, > diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h > index 73a9db302245d..b20541e4046f8 100644 > --- a/drivers/net/wireless/ath/ath10k/core.h > +++ b/drivers/net/wireless/ath/ath10k/core.h > @@ -838,6 +838,9 @@ enum ath10k_fw_features { > /* Firmware support IRAM recovery */ > ATH10K_FW_FEATURE_IRAM_RECOVERY = 22, > > + /* Firmware does not support host capability QMI request */ > + ATH10K_FW_FEATURE_NO_HOST_CAP_QMI_REQ = 23, > + > /* keep last */ > ATH10K_FW_FEATURE_COUNT, > }; > diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c > index 8275345631a0b..5dc8ea39372c1 100644 > --- a/drivers/net/wireless/ath/ath10k/qmi.c > +++ b/drivers/net/wireless/ath/ath10k/qmi.c > @@ -819,9 +819,16 @@ static void ath10k_qmi_event_server_arrive(struct ath10k_qmi *qmi) > return; > } > > - ret = ath10k_qmi_host_cap_send_sync(qmi); > - if (ret) > - return; > + /* > + * Skip the host capability request for the firmware versions which > + * do not support this feature. > + */ > + if (!test_bit(ATH10K_FW_FEATURE_NO_HOST_CAP_QMI_REQ, > + ar->running_fw->fw_file.fw_features)) { > + ret = ath10k_qmi_host_cap_send_sync(qmi); > + if (ret) > + return; > + } > > ret = ath10k_qmi_msa_mem_info_send_sync_msg(qmi); > if (ret) > -- David Heidelberg From david at ixit.cz Tue Nov 25 01:29:23 2025 From: david at ixit.cz (David Heidelberg) Date: Tue, 25 Nov 2025 10:29:23 +0100 Subject: [PATCH v2 0/3] ath10k: Introduce a devicetree quirk to skip host cap QMI requests In-Reply-To: References: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> <2b34ceae-5e31-4dba-93e5-3fa35754fab6@oss.qualcomm.com> Message-ID: <6a3448cf-dd18-4b3d-a8fa-fe282ee779de@ixit.cz> On 10/11/2025 21:41, Dmitry Baryshkov wrote: [...] > I think this should go to the firmware-N file. SNOC platforms now allow > per-platform firmware description files, so it's possible to describe > quirks for the particular firmware file. Since the approach to put it into the firmware failed due to early initialization, see https://lore.kernel.org/linux-wireless/20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86 at ixit.cz/ I wondering if I should get back on this series? Also, meanwhile Paul found another device [1] in need of this quirk. David [1] https://lore.kernel.org/all/20250928-judyln-dts-v3-0-b14cf9e9a928 at postmarketos.org/T/#m90e8087d4388e588b71a0eff01b88f1721f73b73 > >> >> So I'm personally OK with this suggested approach. >> >> /jeff > -- David Heidelberg From ernestvanhoecke at gmail.com Tue Nov 25 01:57:53 2025 From: ernestvanhoecke at gmail.com (Ernest Van Hoecke) Date: Tue, 25 Nov 2025 10:57:53 +0100 Subject: [PATCH 0/2] wifi: ath: Use static calibration variant table for devicetree platforms In-Reply-To: References: <20251114-ath-variant-tbl-v1-0-a9adfc49e3f3@oss.qualcomm.com> <2fd84ab2-2e3e-4d05-add5-17930a35fedf@oss.qualcomm.com> Message-ID: On Tue, Nov 18, 2025 at 12:23:20PM +0530, Manivannan Sadhasivam wrote: > > ath12k doesn't seem to require a calibration variant. But even if the user > replaces ath11k chipset with ath10k one, the calibration variant should be the > same as it is platform specific except for WSI. > > - Mani > > -- > ????????? ???????? > Hi all, Jumping in on this thread to ask about how we should handle variants. We are using the WCN7850 device with the ath12k driver and received three board files for this from Silex, signed by Qualcomm. All three support the same board (SX-PCEBE), where one is the board file to be used for the US/EU/JP and the other two are one for higher emissions in the UK/CA and one for lower emissions in the UK/CA. Since these are needed for regulatory differences but support the same board, we were wondering about your views on how to handle that in mainline. I see that there is no support for the board file selection in the device tree for ath12k, and that there is some discussion on how to handle variants in general. We are using a device tree-based setup and no ACPI. Thanks! Kind regards, Ernest From dmitry.baryshkov at oss.qualcomm.com Tue Nov 25 02:45:38 2025 From: dmitry.baryshkov at oss.qualcomm.com (Dmitry Baryshkov) Date: Tue, 25 Nov 2025 12:45:38 +0200 Subject: [PATCH 1/2] ath10k: Introduce a firmware quirk to skip host cap QMI requests In-Reply-To: <313b36d3-e1b4-4e80-8d5d-d65981abb34b@ixit.cz> References: <20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86@ixit.cz> <20251111-xiaomi-beryllium-firmware-v1-1-836b9c51ad86@ixit.cz> <313b36d3-e1b4-4e80-8d5d-d65981abb34b@ixit.cz> Message-ID: On Tue, 25 Nov 2025 at 11:27, David Heidelberg wrote: > > Sadly, this is too early in the initialization process and we get NULL > deref, similar to [1]. > [dropped splat] > > If no objection raised, I would go back to the original device-tree > property way then (as also another device in need of this quirk showed up). Please fix the NULL deref instead. This is a property of the firmware rather than a device. > > David > > [1] > https://lore.kernel.org/ath10k/54ac2295-36b4-49fc-9583-a10db8d9d5d6 at freebox.fr/ > > On 11/11/2025 13:34, David Heidelberg via B4 Relay wrote: > > From: David Heidelberg > > > > There are firmware versions which do not support host capability > > QMI request. We suspect either the host cap is not implemented or > > there may be firmware specific issues, but apparently there seem > > to be a generation of firmware that has this particular behavior. > > > > For example, firmware build on Xiaomi Poco F1 (sdm845) phone: > > "QC_IMAGE_VERSION_STRING=WLAN.HL.2.0.c3-00257-QCAHLSWMTPLZ-1" > > > > If we do not skip the host cap QMI request on Xiaomi Poco F1, > > then we get a QMI_ERR_MALFORMED_MSG_V01 error message in the > > ath10k_qmi_host_cap_send_sync(). But this error message is not > > fatal to the firmware nor to the ath10k driver and we can still > > bring up the WiFi services successfully if we just ignore it. > > > > Hence introducing this firmware quirk to skip host capability > > QMI request for the firmware versions which do not support this > > feature. > > > > Suggested-by: Dmitry Baryshkov > > Signed-off-by: David Heidelberg > > --- > > drivers/net/wireless/ath/ath10k/core.c | 1 + > > drivers/net/wireless/ath/ath10k/core.h | 3 +++ > > drivers/net/wireless/ath/ath10k/qmi.c | 13 ++++++++++--- > > 3 files changed, 14 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c > > index 7c2939cbde5f0..7602631696798 100644 > > --- a/drivers/net/wireless/ath/ath10k/core.c > > +++ b/drivers/net/wireless/ath/ath10k/core.c > > @@ -773,6 +773,7 @@ static const char *const ath10k_core_fw_feature_str[] = { > > [ATH10K_FW_FEATURE_SINGLE_CHAN_INFO_PER_CHANNEL] = "single-chan-info-per-channel", > > [ATH10K_FW_FEATURE_PEER_FIXED_RATE] = "peer-fixed-rate", > > [ATH10K_FW_FEATURE_IRAM_RECOVERY] = "iram-recovery", > > + [ATH10K_FW_FEATURE_NO_HOST_CAP_QMI_REQ] = "no-host-cap-qmi-req", > > }; > > > > static unsigned int ath10k_core_get_fw_feature_str(char *buf, > > diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h > > index 73a9db302245d..b20541e4046f8 100644 > > --- a/drivers/net/wireless/ath/ath10k/core.h > > +++ b/drivers/net/wireless/ath/ath10k/core.h > > @@ -838,6 +838,9 @@ enum ath10k_fw_features { > > /* Firmware support IRAM recovery */ > > ATH10K_FW_FEATURE_IRAM_RECOVERY = 22, > > > > + /* Firmware does not support host capability QMI request */ > > + ATH10K_FW_FEATURE_NO_HOST_CAP_QMI_REQ = 23, > > + > > /* keep last */ > > ATH10K_FW_FEATURE_COUNT, > > }; > > diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c > > index 8275345631a0b..5dc8ea39372c1 100644 > > --- a/drivers/net/wireless/ath/ath10k/qmi.c > > +++ b/drivers/net/wireless/ath/ath10k/qmi.c > > @@ -819,9 +819,16 @@ static void ath10k_qmi_event_server_arrive(struct ath10k_qmi *qmi) > > return; > > } > > > > - ret = ath10k_qmi_host_cap_send_sync(qmi); > > - if (ret) > > - return; > > + /* > > + * Skip the host capability request for the firmware versions which > > + * do not support this feature. > > + */ > > + if (!test_bit(ATH10K_FW_FEATURE_NO_HOST_CAP_QMI_REQ, > > + ar->running_fw->fw_file.fw_features)) { > > + ret = ath10k_qmi_host_cap_send_sync(qmi); > > + if (ret) > > + return; > > + } > > > > ret = ath10k_qmi_msa_mem_info_send_sync_msg(qmi); > > if (ret) > > > > -- > David Heidelberg > -- With best wishes Dmitry From dmitry.baryshkov at oss.qualcomm.com Tue Nov 25 06:42:15 2025 From: dmitry.baryshkov at oss.qualcomm.com (Dmitry Baryshkov) Date: Tue, 25 Nov 2025 16:42:15 +0200 Subject: [PATCH v2 0/3] ath10k: Introduce a devicetree quirk to skip host cap QMI requests In-Reply-To: <6a3448cf-dd18-4b3d-a8fa-fe282ee779de@ixit.cz> References: <20251110-skip-host-cam-qmi-req-v2-0-0daf485a987a@ixit.cz> <2b34ceae-5e31-4dba-93e5-3fa35754fab6@oss.qualcomm.com> <6a3448cf-dd18-4b3d-a8fa-fe282ee779de@ixit.cz> Message-ID: On Tue, Nov 25, 2025 at 10:29:23AM +0100, David Heidelberg wrote: > On 10/11/2025 21:41, Dmitry Baryshkov wrote: > > [...] > > > I think this should go to the firmware-N file. SNOC platforms now allow > > per-platform firmware description files, so it's possible to describe > > quirks for the particular firmware file. > > Since the approach to put it into the firmware failed due to early > initialization, see > https://lore.kernel.org/linux-wireless/20251111-xiaomi-beryllium-firmware-v1-0-836b9c51ad86 at ixit.cz/ Is it required before we load the firmware? If so, it must be clearly explained in the commit messages. In the end, if it happens before firmware load, there is little you can do. That was the reason why qcom,no-msa-ready-indicator was implemented as a DT property. > > I wondering if I should get back on this series? > > Also, meanwhile Paul found another device [1] in need of this quirk. > > David > > [1] https://lore.kernel.org/all/20250928-judyln-dts-v3-0-b14cf9e9a928 at postmarketos.org/T/#m90e8087d4388e588b71a0eff01b88f1721f73b73 > > > > > > > > > So I'm personally OK with this suggested approach. > > > > > > /jeff > > > > -- > David Heidelberg > -- With best wishes Dmitry From rafael at kernel.org Thu Nov 27 09:41:10 2025 From: rafael at kernel.org (Rafael J. Wysocki) Date: Thu, 27 Nov 2025 18:41:10 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Message-ID: On Sat, Nov 22, 2025 at 3:18?PM Armin Wolf wrote: > > Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: > > > On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: > >> Drivers registering thermal zone/cooling devices are currently unable > >> to tell the thermal core what parent device the new thermal zone/ > >> cooling device should have, potentially causing issues with suspend > >> ordering > > This is one potential class of problems that may arise, but I would > > like to see a real example of this. > > > > As it stands today, thermal_class has no PM callbacks, so there are no > > callback execution ordering issues with devices in that class and what > > other suspend/resume ordering issues are there? > > Correct, that is why i said "potentially". > > > > > Also, the suspend and resume of thermal zones is handled via PM > > notifiers. Is there a problem with this? > > The problem with PM notifiers is that thermal zones stop working even before > user space is frozen. Freezing user space might take a lot of time, so having > no thermal management during this period is less than ideal. This can be addressed by doing thermal zone suspend after freezing tasks and before starting to suspend devices. Accordingly, thermal zones could be resumed after resuming devices and before thawing tasks. That should not be an overly complex change to make. > This problem would not occur when using dev_pm_ops, as thermal zones would be > suspended after user space has been frozen successfully. Additionally, when using > dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates > that no new devices (including thermal zones and cooling devices) be registered during > a suspend/resume cycle. > > Replacing the PM notifiers with dev_pm_ops would of course be a optimization with > its own patch series. Honestly, I don't see much benefit from using dev_pm_ops for thermal zone devices and cooling devices. Moreover, I actually think that they could be "no PM" devices that are not even put on the suspend-resume device list. Technically, they are just interfaces on top of some other devices allowing the user space to interact with the latter and combining different pieces described by the platform firmware. They by themselves have no PM capabilities. > >> and making it impossible for user space applications to > >> associate a given thermal zone device with its parent device. > > Why does user space need to know the parent of a given cooling device > > or thermal zone? > > Lets say that we have two thermal zones registered by two instances of the > Intel Wifi driver. User space is currently unable to find out which thermal zone > belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]"). But the "belong" part is not quite well defined here. I think that what user space needs to know is what devices are located in a given thermal zone, isn't it? Knowing the parent doesn't necessarily address this. > This problem would be solved once we populate the parent device pointer inside the thermal zone > device, as user space can simply look at the "device" symlink to determine the parent device behind > a given thermal zone device. I'm not convinced about this. > Additionally, being able to access the acpi_handle of the parent device will be necessary for the > ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors. I guess by the "parent" you mean the device represented in the ACPI namespace by a ThermalZone object, right? But this is not the same as the "parent" in the Wifi driver context, is it? > >> This patch series aims to fix this issue by extending the functions > >> used to register thermal zone/cooling devices to also accept a parent > >> device pointer. The first six patches convert all functions used for > >> registering cooling devices, while the functions used for registering > >> thermal zone devices are converted by the remaining two patches. > >> > >> I tested this series on various devices containing (among others): > >> - ACPI thermal zones > >> - ACPI processor devices > >> - PCIe cooling devices > >> - Intel Wifi card > >> - Intel powerclamp > >> - Intel TCC cooling > > What exactly did you do to test it? > > I tested: > - the thermal zone temperature readout > - correctness of the new sysfs links > - suspend/resume > > I also verified that ACPI thermal zones still bind with the ACPI fans. I see, thanks. > >> I also compile-tested the remaining affected drivers, however i would > >> still be happy if the relevant maintainers (especially those of the > >> mellanox ethernet switch driver) could take a quick glance at the > >> code and verify that i am using the correct device as the parent > >> device. > > I think that the above paragraph is not relevant any more? > > You are right, however i originally meant to CC the mellanox maintainers as > i was a bit unsure about the changes i made to their driver. I will rework > this section in the next revision and CC the mellanox maintainers. > > > > >> This work is also necessary for extending the ACPI thermal zone driver > >> to support the _TZD ACPI object in the future. > > I'm still unsure why _TZD support requires the ability to set a > > thermal zone parent device. > > _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans > and ACPI processors, like ACPI batteries. No, it is not for cooling devices if my reading of the specification is correct. It says: "_TZD (Thermal Zone Devices) This optional object evaluates to a package of device names. Each name corresponds to a device in the ACPI namespace that is associated with the thermal zone. The temperature reported by the thermal zone is roughly correspondent to that of each of the devices." And then "The list of devices returned by the control method need not be a complete and absolute list of devices affected by the thermal zone. However, the package should at least contain the devices that would uniquely identify where this thermal zone is located in the machine. For example, a thermal zone in a docking station should include a device in the docking station, a thermal zone for the CD-ROM bay, should include the CD-ROM." So IIUC this is a list of devices allowing the location of the thermal zone to be figured out. There's nothing about cooling in this definition. > This however will currently not work as > the ACPI thermal zone driver uses the private drvdata of the cooling device to > determine if said cooling device should bind. This only works for ACPI fans and > processors due to the fact that those drivers store a ACPI device pointer inside > drvdata, something the ACPI thermal zone expects. I'm not sure I understand the above. There is a list of ACPI device handles per trip point, as returned by either _PSL or _ALx. Devices whose handles are in that list will be bound to the thermal zone, so long as there are struct acpi_device objects representing them which is verified with the help of the devdata field in struct thermal_cooling_device. IOW, cooling device drivers that create struct thermal_cooling_device objects representing them are expected to set devdata in those objects to point to struct acpi_device objects corresponding to their ACPI handles, but in principle acpi_thermal_should_bind_cdev() might as well just use the handles themselves. It just needs to know that there is a cooling driver on the other side of the ACPI handle. The point is that a cooling device to be bound to an ACPI thermal zone needs an ACPI handle in the first place to be listed in _PSL or _ALx. > As we cannot require all cooling devices to store an ACPI device pointer inside > their drvdata field in order to support ACPI, Cooling devices don't store ACPI device pointers in struct thermal_cooling_device objects, ACPI cooling drivers do, and there are two reasons to do that: (1) to associate a given struct thermal_cooling_device with an ACPI handle and (2) to let acpi_thermal_should_bind_cdev() know that the cooling device is present and functional. This can be changed to store an ACPI handle in struct thermal_cooling_device and acpi_thermal_should_bind_cdev() may just verify that the device is there by itself. > we must use a more generic approach. I'm not sure what use case you are talking about. Surely, devices with no representation in the ACPI namespace cannot be bound to ACPI thermal zones. For devices that have a representation in the ACPI namespace, storing an ACPI handle in devdata should not be a problem. > I was thinking about using the acpi_handle of the parent device instead of messing > with the drvdata field, but this only works if the parent device pointer of the > cooling device is populated. > > (Cooling devices without a parent device would then be ignored by the ACPI thermal > zone driver, as such cooling devices cannot be linked to ACPI). It can be arranged this way, but what's the practical difference? Anyone who creates a struct thermal_cooling_device and can set its parent pointer to a device with an ACPI companion, may as well set its devdata to point to that companion directly - or to its ACPI handle if that's preferred. From rafael at kernel.org Thu Nov 27 10:22:37 2025 From: rafael at kernel.org (Rafael J. Wysocki) Date: Thu, 27 Nov 2025 19:22:37 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Message-ID: On Sat, Nov 22, 2025 at 3:18?PM Armin Wolf wrote: > > Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: > > > On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: [...] > >> --- > >> Armin Wolf (8): > >> thermal: core: Allow setting the parent device of cooling devices > >> thermal: core: Set parent device in thermal_of_cooling_device_register() > >> ACPI: processor: Stop creating "device" sysfs link > > > > That link is not to the cooling devices' parent, but to the ACPI > > device object (a struct acpi_device) that corresponds to the parent. > > The parent of the cooling device should be the processor device, not > > its ACPI companion, so I'm not sure why there would be a conflict. > > From the perspective of the Linux device core, a parent device does not have to be > a "physical" device. In the case of the ACPI processor driver, the ACPI device is used, > so the cooling device registered by said driver belongs to the ACPI device. Well, that's a problem. A struct acpi_device should not be a parent of anything other than a struct acpi_device. > I agree that using the Linux processor device would make more sense, but this will require > changes inside the ACPI processor driver. So be it. > As for the "device" symlink: The conflict would be a naming conflict, as both "device" symlinks > (the one created by the ACPI processor driver and the one created by the device core) will > be created in the same directory (which is the directory of the cooling device). I see. But why is the new symlink needed in the first place? If the device has a parent, it will appear under that parent in /sys/devices/, won't it? Currently, all of the thermal class devices appear under /sys/devices/virtual/thermal/ because they have no parents and they all get a class parent kobject under /sys/devices/virtual/, as that's what get_device_parent() does. If they have real parents, they will appear under those parents, so why will the parents need to be pointed to additionally? BTW, this means that the layout of /sys/devices/ will change when thermal devices get real parents. I'm not sure if this is a problem, but certainly something to note. > >> ACPI: fan: Stop creating "device" sysfs link > >> ACPI: video: Stop creating "device" sysfs link > > Analogously in the above two cases AFAICS. > > > > The parent of a cooling device should be a "physical" device object, > > like a platform device or a PCI device or similar, not a struct > > acpi_device (which in fact is not a device even). > > From the perspective of the Linux device core, a ACPI device is a perfectly valid device. The driver core is irrelevant here. As I said before, a struct acpi_device object should not be a parent of anything other than a struct acpi_device object. Those things are not devices and they cannot be used for representing PM dependencies, for example. > I agree that using a platform device or PCI device is better, but this already happens > inside the ACPI fan driver (platform device). So it should not happen there. > Only the ACPI video driver created a "device" sysfs link that points to the ACPI device > instead of the PCI device. I just noticed that i accidentally changed this by using the > PCI device as the parent device for the cooling device. > > If you want then we can keep this change. The PCI device should be its parent. > >> thermal: core: Set parent device in thermal_cooling_device_register() > >> ACPI: thermal: Stop creating "device" sysfs link > > And this link is to the struct acpi_device representing the thermal zone itself. > > Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to > ACPI devices. Because of this all (thermal zone) devices created by an instance of > said driver are descendants of the ACPI device said instance is bound to. > > We can of course convert the ACPI thermal zone driver into a platform driver, but > this would be a separate patch series. If you want parents, this needs to be done first, but I'm still not sure what the parent of a thermal zone would represent. In the ACPI case it is kind of easy - it would be the (platform) device corresponding to a given ThermalZone object in the ACPI namespace - but it only has a practical meaning if that device has a specific parent. For example, if the corresponding ThermalZone object is present in the \_SB scope, the presence of the thermal zone parent won't provide any additional information. Unfortunately, the language in the specification isn't particularly helpful here: "Thermal zone objects should appear in the namespace under the portion of the system that comprises the thermal zone. For example, a thermal zone that is isolated to a docking station should be defined within the scope of the docking station device." To me "the portion of the system" is not too meaningful unless it is just one device without children. That's why _TZD has been added AFAICS. > >> thermal: core: Allow setting the parent device of thermal zone devices > > > > I'm not sure if this is a good idea, at least until it is clear what > > the role of a thermal zone parent device should be. > > Take a look at my explanation with the Intel Wifi driver. I did and I think that you want the parent to be a device somehow associated with the thermal zone, but how exactly? What should that be in the Wifi driver case, the PCI device or something else? And what if the thermal zone affects multiple devices? Which of them (if any) would be its parent? And would it be consistent with the ACPI case described above? All of that needs consideration IMV. From W_Armin at gmx.de Thu Nov 27 12:06:44 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Thu, 27 Nov 2025 21:06:44 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Message-ID: Am 27.11.25 um 18:41 schrieb Rafael J. Wysocki: > On Sat, Nov 22, 2025 at 3:18?PM Armin Wolf wrote: >> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: >> >>> On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: >>>> Drivers registering thermal zone/cooling devices are currently unable >>>> to tell the thermal core what parent device the new thermal zone/ >>>> cooling device should have, potentially causing issues with suspend >>>> ordering >>> This is one potential class of problems that may arise, but I would >>> like to see a real example of this. >>> >>> As it stands today, thermal_class has no PM callbacks, so there are no >>> callback execution ordering issues with devices in that class and what >>> other suspend/resume ordering issues are there? >> Correct, that is why i said "potentially". >> >>> Also, the suspend and resume of thermal zones is handled via PM >>> notifiers. Is there a problem with this? >> The problem with PM notifiers is that thermal zones stop working even before >> user space is frozen. Freezing user space might take a lot of time, so having >> no thermal management during this period is less than ideal. > This can be addressed by doing thermal zone suspend after freezing > tasks and before starting to suspend devices. Accordingly, thermal > zones could be resumed after resuming devices and before thawing > tasks. That should not be an overly complex change to make. AFAIK this is only possible by using dev_pm_ops, the PM notifier is triggered before tasks are frozen during suspend and after they are thawed during resume. Using dev_pm_ops would also ensure that thermal zone devices are resumed after their parent devices, so no additional changes inside the pm core would be needed. >> This problem would not occur when using dev_pm_ops, as thermal zones would be >> suspended after user space has been frozen successfully. Additionally, when using >> dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates >> that no new devices (including thermal zones and cooling devices) be registered during >> a suspend/resume cycle. >> >> Replacing the PM notifiers with dev_pm_ops would of course be a optimization with >> its own patch series. > Honestly, I don't see much benefit from using dev_pm_ops for thermal > zone devices and cooling devices. Moreover, I actually think that > they could be "no PM" devices that are not even put on the > suspend-resume device list. Technically, they are just interfaces on > top of some other devices allowing the user space to interact with the > latter and combining different pieces described by the platform > firmware. They by themselves have no PM capabilities. Correct, thermal zone devices are virtual devices representing thermal management aspects of the underlying parent device. This however does not mean that thermal zone devices have no PM capabilities, because they contain state. Some part of this state (namely TZ_STATE_FLAG_SUSPENDED and TZ_STATE_FLAG_RESUMING) is affected by power management, so we should tell the device core about this by using dev_pm_ops instead of the PM notifier. >>>> and making it impossible for user space applications to >>>> associate a given thermal zone device with its parent device. >>> Why does user space need to know the parent of a given cooling device >>> or thermal zone? >> Lets say that we have two thermal zones registered by two instances of the >> Intel Wifi driver. User space is currently unable to find out which thermal zone >> belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]"). > But the "belong" part is not quite well defined here. I think that > what user space needs to know is what devices are located in a given > thermal zone, isn't it? Knowing the parent doesn't necessarily > address this. The device exposing a given thermal zone device is not always a member of the thermal zone itself. In case of the Intel Wifi adapters, the individual Wifi adapters are indeed members of the thermal zone associated with their thermal zone device. But thermal zones created thru a system management controller for example might only cover devices like the CPUs and GPUs, not the system management controller device itself. The parent device of a child device is the upstream device of the child device. The connection between parent and child can be physical (SMBus controller (parent) -> i2c device (child)) or purely logical (PCI device (parent) -> thermal zone device (child)). There exists a parent-child dependency between a parent and a child device (the child device cannot function without its parent being operational), and user space might want to be able to discover such dependencies. >> This problem would be solved once we populate the parent device pointer inside the thermal zone >> device, as user space can simply look at the "device" symlink to determine the parent device behind >> a given thermal zone device. > I'm not convinced about this. > >> Additionally, being able to access the acpi_handle of the parent device will be necessary for the >> ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors. > I guess by the "parent" you mean the device represented in the ACPI > namespace by a ThermalZone object, right? But this is not the same as > the "parent" in the Wifi driver context, is it? In the context of a ACPI ThermalZone, the parent device of the thermal cooling device would currently be the ACPI device bound to the "thermal" ACPI driver. In the context of the Intel Wifi card, the parent device would be PCI device bound to the corresponding Intel Wifi driver. I think you misunderstood what kind of parent device i was referring to. You likely though that i was referring to the parent device of the ACPI ThermalZone, right? That however is not the case , with "parent device" i was referring to the device responsible for creating a given struct thermal_zone_device instance. >>>> This patch series aims to fix this issue by extending the functions >>>> used to register thermal zone/cooling devices to also accept a parent >>>> device pointer. The first six patches convert all functions used for >>>> registering cooling devices, while the functions used for registering >>>> thermal zone devices are converted by the remaining two patches. >>>> >>>> I tested this series on various devices containing (among others): >>>> - ACPI thermal zones >>>> - ACPI processor devices >>>> - PCIe cooling devices >>>> - Intel Wifi card >>>> - Intel powerclamp >>>> - Intel TCC cooling >>> What exactly did you do to test it? >> I tested: >> - the thermal zone temperature readout >> - correctness of the new sysfs links >> - suspend/resume >> >> I also verified that ACPI thermal zones still bind with the ACPI fans. > I see, thanks. > >>>> I also compile-tested the remaining affected drivers, however i would >>>> still be happy if the relevant maintainers (especially those of the >>>> mellanox ethernet switch driver) could take a quick glance at the >>>> code and verify that i am using the correct device as the parent >>>> device. >>> I think that the above paragraph is not relevant any more? >> You are right, however i originally meant to CC the mellanox maintainers as >> i was a bit unsure about the changes i made to their driver. I will rework >> this section in the next revision and CC the mellanox maintainers. >> >>>> This work is also necessary for extending the ACPI thermal zone driver >>>> to support the _TZD ACPI object in the future. >>> I'm still unsure why _TZD support requires the ability to set a >>> thermal zone parent device. >> _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans >> and ACPI processors, like ACPI batteries. > No, it is not for cooling devices if my reading of the specification > is correct. It says: > > "_TZD (Thermal Zone Devices) > > This optional object evaluates to a package of device names. Each name > corresponds to a device in the ACPI namespace that is associated with > the thermal zone. The temperature reported by the thermal zone is > roughly correspondent to that of each of the devices." > > And then > > "The list of devices returned by the control method need not be a > complete and absolute list of devices affected by the thermal zone. > However, the package should at least contain the devices that would > uniquely identify where this thermal zone is located in the machine. > For example, a thermal zone in a docking station should include a > device in the docking station, a thermal zone for the CD-ROM bay, > should include the CD-ROM." > > So IIUC this is a list of devices allowing the location of the thermal > zone to be figured out. There's nothing about cooling in this > definition. Using _TZD to figure out the location of a given thermal zone is another usage of this ACPI control method, but lets take a look at section 11.6: - If _PSV is defined then either the _PSL or _TZD objects must exist. The _PSL and _TZD objects may both exist. - If _PSV is defined and _PSL is not defined then at least one device in thermal zone, as indicated by either the _TZD device list or devices? _TZM objects, must support device performance states. So according to my understanding, _TZD can also be used to discover additional cooling devices used for passive cooling. This makes sense as _PSL is defined to only contain processor objects (see section 11.4.10), so _TZD can act like an extension of _PSL for things like ACPI control method batteries (see 10.2.2.12). Microsoft also follows this approach (see https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide section "Thermally managed devices" paragraph "Processor aggregator"). >> This however will currently not work as >> the ACPI thermal zone driver uses the private drvdata of the cooling device to >> determine if said cooling device should bind. This only works for ACPI fans and >> processors due to the fact that those drivers store a ACPI device pointer inside >> drvdata, something the ACPI thermal zone expects. > I'm not sure I understand the above. > > There is a list of ACPI device handles per trip point, as returned by > either _PSL or _ALx. Devices whose handles are in that list will be > bound to the thermal zone, so long as there are struct acpi_device > objects representing them which is verified with the help of the > devdata field in struct thermal_cooling_device. AFAIK devdata is meant to be used by the thermal zone device callbacks to access the state container struct of the associated device driver instance. Assuming that a given device driver will populate devdata with a pointer to is ACPI companion device is an implementation-specific detail that does not apply to all cooling device implementations. It just so happens that the ACPI processor and fan driver do this, likely because they where designed specifically to work with the ACPI thermal zone driver. The documentation of thermal_cooling_device_register() even describes devdata as "device private data", so any meaning of devdata purely depends on the given device driver. > IOW, cooling device drivers that create struct thermal_cooling_device > objects representing them are expected to set devdata in those objects > to point to struct acpi_device objects corresponding to their ACPI > handles, but in principle acpi_thermal_should_bind_cdev() might as > well just use the handles themselves. It just needs to know that > there is a cooling driver on the other side of the ACPI handle. > > The point is that a cooling device to be bound to an ACPI thermal zone > needs an ACPI handle in the first place to be listed in _PSL or _ALx. Correct, i merely change the way the ACPI thermal zone driver retrieves the ACPI handle associated with a given cooling device. >> As we cannot require all cooling devices to store an ACPI device pointer inside >> their drvdata field in order to support ACPI, > Cooling devices don't store ACPI device pointers in struct > thermal_cooling_device objects, ACPI cooling drivers do, and there are > two reasons to do that: (1) to associate a given struct > thermal_cooling_device with an ACPI handle and (2) to let > acpi_thermal_should_bind_cdev() know that the cooling device is > present and functional. > > This can be changed to store an ACPI handle in struct > thermal_cooling_device and acpi_thermal_should_bind_cdev() may just > verify that the device is there by itself. I can of course extend thermal_cooling_device_register() to accept a fwnode_handle that can be used for both ACPI and OF based cooling device identification, if this is what you prefer. This patch series would then turn into a cleanup series, focusing on properly adding thermal zone devices and cooling devices into the global device hierarchy. >> we must use a more generic approach. > I'm not sure what use case you are talking about. > > Surely, devices with no representation in the ACPI namespace cannot be > bound to ACPI thermal zones. For devices that have a representation > in the ACPI namespace, storing an ACPI handle in devdata should not be > a problem. See my above explanations for details, drvdata is defined to hold device private data, nothing more. >> I was thinking about using the acpi_handle of the parent device instead of messing >> with the drvdata field, but this only works if the parent device pointer of the >> cooling device is populated. >> >> (Cooling devices without a parent device would then be ignored by the ACPI thermal >> zone driver, as such cooling devices cannot be linked to ACPI). > It can be arranged this way, but what's the practical difference? > Anyone who creates a struct thermal_cooling_device and can set its > parent pointer to a device with an ACPI companion, may as well set its > devdata to point to that companion directly - or to its ACPI handle if > that's preferred. Yes, but this would require explicit support for ACPI in every driver that registers cooling devices. Using the parent device to retrieve the acpi_handle or allowing all drivers to just submit a fwnode_handle of their choice when creating a cooling device will fix this. Thanks, Armin Wolf From W_Armin at gmx.de Thu Nov 27 12:29:15 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Thu, 27 Nov 2025 21:29:15 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Message-ID: Am 27.11.25 um 19:22 schrieb Rafael J. Wysocki: > On Sat, Nov 22, 2025 at 3:18?PM Armin Wolf wrote: >> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: >> >>> On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: > [...] > >>>> --- >>>> Armin Wolf (8): >>>> thermal: core: Allow setting the parent device of cooling devices >>>> thermal: core: Set parent device in thermal_of_cooling_device_register() >>>> ACPI: processor: Stop creating "device" sysfs link >>> That link is not to the cooling devices' parent, but to the ACPI >>> device object (a struct acpi_device) that corresponds to the parent. >>> The parent of the cooling device should be the processor device, not >>> its ACPI companion, so I'm not sure why there would be a conflict. >> From the perspective of the Linux device core, a parent device does not have to be >> a "physical" device. In the case of the ACPI processor driver, the ACPI device is used, >> so the cooling device registered by said driver belongs to the ACPI device. > Well, that's a problem. A struct acpi_device should not be a parent > of anything other than a struct acpi_device. Understandable, in this case we should indeed use the the CPU device, especially since the fwnode associated with it already points to the correct ACPI processor object (at least on my machine). >> I agree that using the Linux processor device would make more sense, but this will require >> changes inside the ACPI processor driver. > So be it. OK. >> As for the "device" symlink: The conflict would be a naming conflict, as both "device" symlinks >> (the one created by the ACPI processor driver and the one created by the device core) will >> be created in the same directory (which is the directory of the cooling device). > I see. > > But why is the new symlink needed in the first place? If the device > has a parent, it will appear under that parent in /sys/devices/, won't > it? > > Currently, all of the thermal class devices appear under > /sys/devices/virtual/thermal/ because they have no parents and they > all get a class parent kobject under /sys/devices/virtual/, as that's > what get_device_parent() does. > > If they have real parents, they will appear under those parents, so > why will the parents need to be pointed to additionally? The "device" smylink is a comfort feature provided by the device core itself to allow user space application to traverse the device tree from bottom to top, like a double-linked list. We cannot disable the creation of this symlink, nor should we. > BTW, this means that the layout of /sys/devices/ will change when > thermal devices get real parents. I'm not sure if this is a problem, > but certainly something to note. I know, most applications likely use /sys/class/thermal/, so they are not impacted by this. I will note this in the cover letter of the next revision. >>>> ACPI: fan: Stop creating "device" sysfs link >>>> ACPI: video: Stop creating "device" sysfs link >>> Analogously in the above two cases AFAICS. >>> >>> The parent of a cooling device should be a "physical" device object, >>> like a platform device or a PCI device or similar, not a struct >>> acpi_device (which in fact is not a device even). >> From the perspective of the Linux device core, a ACPI device is a perfectly valid device. > The driver core is irrelevant here. > > As I said before, a struct acpi_device object should not be a parent > of anything other than a struct acpi_device object. Those things are > not devices and they cannot be used for representing PM dependencies, > for example. > >> I agree that using a platform device or PCI device is better, but this already happens >> inside the ACPI fan driver (platform device). > So it should not happen there. I meant that the ACPI fan driver already uses the platform device as the parent device of the cooling device, so the ACPI device is only used for interacting with the ACPI control methods (and registering sysfs attributes i think). >> Only the ACPI video driver created a "device" sysfs link that points to the ACPI device >> instead of the PCI device. I just noticed that i accidentally changed this by using the >> PCI device as the parent device for the cooling device. >> >> If you want then we can keep this change. > The PCI device should be its parent. Alright, i will note this in the patch description. >>>> thermal: core: Set parent device in thermal_cooling_device_register() >>>> ACPI: thermal: Stop creating "device" sysfs link >>> And this link is to the struct acpi_device representing the thermal zone itself. >> Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to >> ACPI devices. Because of this all (thermal zone) devices created by an instance of >> said driver are descendants of the ACPI device said instance is bound to. >> >> We can of course convert the ACPI thermal zone driver into a platform driver, but >> this would be a separate patch series. > If you want parents, this needs to be done first, but I'm still not > sure what the parent of a thermal zone would represent. > > In the ACPI case it is kind of easy - it would be the (platform) > device corresponding to a given ThermalZone object in the ACPI > namespace - but it only has a practical meaning if that device has a > specific parent. For example, if the corresponding ThermalZone object > is present in the \_SB scope, the presence of the thermal zone parent > won't provide any additional information. To the device core it will, as the platform device will need to be suspended after the thermal zone device has been suspended, among other things. > Unfortunately, the language in the specification isn't particularly > helpful here: "Thermal zone objects should appear in the namespace > under the portion of the system that comprises the thermal zone. For > example, a thermal zone that is isolated to a docking station should > be defined within the scope of the docking station device." To me > "the portion of the system" is not too meaningful unless it is just > one device without children. That's why _TZD has been added AFAICS. I think you are confusing the parent device of the ThermalZone ACPI device with the parent device of the struct thermal_zone_device. I begin to wonder if mentioning the ACPI ThermalZone device together with the struct thermal_zone_device was a bad idea on my side xd. >>>> thermal: core: Allow setting the parent device of thermal zone devices >>> I'm not sure if this is a good idea, at least until it is clear what >>> the role of a thermal zone parent device should be. >> Take a look at my explanation with the Intel Wifi driver. > I did and I think that you want the parent to be a device somehow > associated with the thermal zone, but how exactly? What should that > be in the Wifi driver case, the PCI device or something else? > > And what if the thermal zone affects multiple devices? Which of them > (if any) would be its parent? And would it be consistent with the > ACPI case described above? > > All of that needs consideration IMV. I agree, but there is a difference between "this struct thermal_zone_device depends on device X to be operational" and "this thermal zone affects device X, device Y and device Z". This patch series exclusively deals with telling the driver core that "this struct thermal_zone_device depends on device X to be operational". Thanks, Armin Wolf From rafael at kernel.org Thu Nov 27 13:46:57 2025 From: rafael at kernel.org (Rafael J. Wysocki) Date: Thu, 27 Nov 2025 22:46:57 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Message-ID: On Thu, Nov 27, 2025 at 9:06?PM Armin Wolf wrote: > > Am 27.11.25 um 18:41 schrieb Rafael J. Wysocki: > > > On Sat, Nov 22, 2025 at 3:18?PM Armin Wolf wrote: > >> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: > >> > >>> On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: > >>>> Drivers registering thermal zone/cooling devices are currently unable > >>>> to tell the thermal core what parent device the new thermal zone/ > >>>> cooling device should have, potentially causing issues with suspend > >>>> ordering > >>> This is one potential class of problems that may arise, but I would > >>> like to see a real example of this. > >>> > >>> As it stands today, thermal_class has no PM callbacks, so there are no > >>> callback execution ordering issues with devices in that class and what > >>> other suspend/resume ordering issues are there? > >> Correct, that is why i said "potentially". > >> > >>> Also, the suspend and resume of thermal zones is handled via PM > >>> notifiers. Is there a problem with this? > >> The problem with PM notifiers is that thermal zones stop working even before > >> user space is frozen. Freezing user space might take a lot of time, so having > >> no thermal management during this period is less than ideal. > > This can be addressed by doing thermal zone suspend after freezing > > tasks and before starting to suspend devices. Accordingly, thermal > > zones could be resumed after resuming devices and before thawing > > tasks. That should not be an overly complex change to make. > > AFAIK this is only possible by using dev_pm_ops, Of course it is not the case. For example, thermal_pm_notify_prepare() could be called directly from dpm_prepare() and thermal_pm_notify_complete() could be called directly from dpm_complete() (which would require switching over thermal to a non-freezable workqueue). > the PM notifier is triggered before tasks are frozen during suspend and after they are thawed during resume. I know that. > Using dev_pm_ops would also ensure that thermal zone devices are resumed after their > parent devices, so no additional changes inside the pm core would be needed. Not really. thermal_pm_suspended needs to be set and cleared from somewhere. > >> This problem would not occur when using dev_pm_ops, as thermal zones would be > >> suspended after user space has been frozen successfully. Additionally, when using > >> dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates > >> that no new devices (including thermal zones and cooling devices) be registered during > >> a suspend/resume cycle. > >> > >> Replacing the PM notifiers with dev_pm_ops would of course be a optimization with > >> its own patch series. > > > > Honestly, I don't see much benefit from using dev_pm_ops for thermal > > zone devices and cooling devices. Moreover, I actually think that > > they could be "no PM" devices that are not even put on the > > suspend-resume device list. Technically, they are just interfaces on > > top of some other devices allowing the user space to interact with the > > latter and combining different pieces described by the platform > > firmware. They by themselves have no PM capabilities. > > Correct, thermal zone devices are virtual devices representing thermal management > aspects of the underlying parent device. This however does not mean that thermal zone > devices have no PM capabilities, because they contain state. Some part of this state > (namely TZ_STATE_FLAG_SUSPENDED and TZ_STATE_FLAG_RESUMING) is affected by power management, > so we should tell the device core about this by using dev_pm_ops instead of the PM notifier. Changing the zone state to anything different from TZ_STATE_READY causes __thermal_zone_device_update() to do nothing and this is the whole "suspend". It does not need to be done from a PM callback and I see no reason why doing it from a PM callback would be desirable. Sorry. Apart from the above, TZ_STATE_FLAG_SUSPENDED and TZ_STATE_FLAG_RESUMING are only used for coordination between thermal_zone_pm_prepare(), thermal_zone_device_resume() and thermal_zone_pm_complete(), so this is not a state anything other then the specific thermal zone in question cares about. Moreover, resuming a thermal zone before resuming any cooling devices bound to it would almost certainly break things and I'm not sure how you would make that work with dev_pm_ops. BTW, using device links for this is not an option as far as I'm concerned. > >>>> and making it impossible for user space applications to > >>>> associate a given thermal zone device with its parent device. > >>> Why does user space need to know the parent of a given cooling device > >>> or thermal zone? > >> Lets say that we have two thermal zones registered by two instances of the > >> Intel Wifi driver. User space is currently unable to find out which thermal zone > >> belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]"). > > But the "belong" part is not quite well defined here. I think that > > what user space needs to know is what devices are located in a given > > thermal zone, isn't it? Knowing the parent doesn't necessarily > > address this. > > The device exposing a given thermal zone device is not always a member of the thermal zone itself. > In case of the Intel Wifi adapters, the individual Wifi adapters are indeed members of the thermal zone > associated with their thermal zone device. But thermal zones created thru a system management controller > for example might only cover devices like the CPUs and GPUs, not the system management controller device itself. Well, exactly. > The parent device of a child device is the upstream device of the child device. The connection between parent > and child can be physical (SMBus controller (parent) -> i2c device (child)) or purely logical > (PCI device (parent) -> thermal zone device (child)). There exists a parent-child dependency between a parent > and a child device (the child device cannot function without its parent being operational), and user space > might want to be able to discover such dependencies. But this needs to be consistent. If the parent of one thermal zone represents the device affected by it and the parent of another thermal zone represents something else, user space will need platform-specific knowledge to figure this out, which is the case today. Without consistency, this is just not useful. > >> This problem would be solved once we populate the parent device pointer inside the thermal zone > >> device, as user space can simply look at the "device" symlink to determine the parent device behind > >> a given thermal zone device. > > I'm not convinced about this. > > > >> Additionally, being able to access the acpi_handle of the parent device will be necessary for the > >> ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors. > > I guess by the "parent" you mean the device represented in the ACPI > > namespace by a ThermalZone object, right? But this is not the same as > > the "parent" in the Wifi driver context, is it? > > In the context of a ACPI ThermalZone, the parent device of the thermal cooling device would currently > be the ACPI device bound to the "thermal" ACPI driver. In the context of the Intel Wifi card, the parent > device would be PCI device bound to the corresponding Intel Wifi driver. > > I think you misunderstood what kind of parent device i was referring to. You likely though that i was referring > to the parent device of the ACPI ThermalZone, right? No. I thought that you were referring to the ACPI ThermalZone itself. Or rather, a platform device associated with the ACPI ThermalZone (that is, the device the ACPI ThermalZone in the ACPI_COMPAION() of). > That however is not the case , with "parent device" i was > referring to the device responsible for creating a given struct thermal_zone_device instance. So I was not confused. > >>>> This patch series aims to fix this issue by extending the functions > >>>> used to register thermal zone/cooling devices to also accept a parent > >>>> device pointer. The first six patches convert all functions used for > >>>> registering cooling devices, while the functions used for registering > >>>> thermal zone devices are converted by the remaining two patches. > >>>> > >>>> I tested this series on various devices containing (among others): > >>>> - ACPI thermal zones > >>>> - ACPI processor devices > >>>> - PCIe cooling devices > >>>> - Intel Wifi card > >>>> - Intel powerclamp > >>>> - Intel TCC cooling > >>> What exactly did you do to test it? > >> I tested: > >> - the thermal zone temperature readout > >> - correctness of the new sysfs links > >> - suspend/resume > >> > >> I also verified that ACPI thermal zones still bind with the ACPI fans. > > I see, thanks. > > > >>>> I also compile-tested the remaining affected drivers, however i would > >>>> still be happy if the relevant maintainers (especially those of the > >>>> mellanox ethernet switch driver) could take a quick glance at the > >>>> code and verify that i am using the correct device as the parent > >>>> device. > >>> I think that the above paragraph is not relevant any more? > >> You are right, however i originally meant to CC the mellanox maintainers as > >> i was a bit unsure about the changes i made to their driver. I will rework > >> this section in the next revision and CC the mellanox maintainers. > >> > >>>> This work is also necessary for extending the ACPI thermal zone driver > >>>> to support the _TZD ACPI object in the future. > >>> I'm still unsure why _TZD support requires the ability to set a > >>> thermal zone parent device. > >> _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans > >> and ACPI processors, like ACPI batteries. > > No, it is not for cooling devices if my reading of the specification > > is correct. It says: > > > > "_TZD (Thermal Zone Devices) > > > > This optional object evaluates to a package of device names. Each name > > corresponds to a device in the ACPI namespace that is associated with > > the thermal zone. The temperature reported by the thermal zone is > > roughly correspondent to that of each of the devices." > > > > And then > > > > "The list of devices returned by the control method need not be a > > complete and absolute list of devices affected by the thermal zone. > > However, the package should at least contain the devices that would > > uniquely identify where this thermal zone is located in the machine. > > For example, a thermal zone in a docking station should include a > > device in the docking station, a thermal zone for the CD-ROM bay, > > should include the CD-ROM." > > > > So IIUC this is a list of devices allowing the location of the thermal > > zone to be figured out. There's nothing about cooling in this > > definition. > > Using _TZD to figure out the location of a given thermal zone is another usage > of this ACPI control method, but lets take a look at section 11.6: > > - If _PSV is defined then either the _PSL or _TZD objects must exist. The _PSL and _TZD objects may both exist. > - If _PSV is defined and _PSL is not defined then at least one device in thermal zone, as indicated by either the > _TZD device list or devices? _TZM objects, must support device performance states. > > So according to my understanding, _TZD can also be used to discover additional cooling devices used for passive cooling. But it doesn't actually say how those "device performance states" are supposed to be used for cooling, does it? > This makes sense as _PSL is defined to only contain processor objects (see section 11.4.10), so _TZD can act like an > extension of _PSL for things like ACPI control method batteries (see 10.2.2.12). But not everything in _TZD needs to be a potential "cooling device" and how you'll decide which one is? > Microsoft also follows this approach (see https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide > section "Thermally managed devices" paragraph "Processor aggregator"). Interesting. I agree that it would make sense to follow them because there will be platform dependencies on that, if there aren't already. > >> This however will currently not work as > >> the ACPI thermal zone driver uses the private drvdata of the cooling device to > >> determine if said cooling device should bind. This only works for ACPI fans and > >> processors due to the fact that those drivers store a ACPI device pointer inside > >> drvdata, something the ACPI thermal zone expects. > > I'm not sure I understand the above. > > > > There is a list of ACPI device handles per trip point, as returned by > > either _PSL or _ALx. Devices whose handles are in that list will be > > bound to the thermal zone, so long as there are struct acpi_device > > objects representing them which is verified with the help of the > > devdata field in struct thermal_cooling_device. > > AFAIK devdata is meant to be used by the thermal zone device callbacks to access the state > container struct of the associated device driver instance. Assuming that a given device driver > will populate devdata with a pointer to is ACPI companion device is an implementation-specific > detail that does not apply to all cooling device implementations. It just so happens that the > ACPI processor and fan driver do this, likely because they where designed specifically to work > with the ACPI thermal zone driver. > > The documentation of thermal_cooling_device_register() even describes devdata as "device private data", so any meaning of devdata purely depends on the > given device driver. Yes, and these particular drivers decide to store a pointer to struct acpi_device in it. But this is not super important, they might as well set the ACPI_COMPANION() of the cooling device to the corresponding struct acpi_device and the ACPI thermal driver might use that information. I'm not opposed to using parents for this purpose, but it doesn't change the big picture that the ACPI thermal driver will need to know the ACPI handle corresponding to each cooling device. If you want to use _TZD instead of or in addition to _PSL for this, it doesn't change much here, it's just another list of ACPI handles, so saying that parents are needed for supporting this is not exactly accurate IMV. > > IOW, cooling device drivers that create struct thermal_cooling_device > > objects representing them are expected to set devdata in those objects > > to point to struct acpi_device objects corresponding to their ACPI > > handles, but in principle acpi_thermal_should_bind_cdev() might as > > well just use the handles themselves. It just needs to know that > > there is a cooling driver on the other side of the ACPI handle. > > > > The point is that a cooling device to be bound to an ACPI thermal zone > > needs an ACPI handle in the first place to be listed in _PSL or _ALx. > > Correct, i merely change the way the ACPI thermal zone driver retrieves the > ACPI handle associated with a given cooling device. Right. > >> As we cannot require all cooling devices to store an ACPI device pointer inside > >> their drvdata field in order to support ACPI, > > Cooling devices don't store ACPI device pointers in struct > > thermal_cooling_device objects, ACPI cooling drivers do, and there are > > two reasons to do that: (1) to associate a given struct > > thermal_cooling_device with an ACPI handle and (2) to let > > acpi_thermal_should_bind_cdev() know that the cooling device is > > present and functional. > > > > This can be changed to store an ACPI handle in struct > > thermal_cooling_device and acpi_thermal_should_bind_cdev() may just > > verify that the device is there by itself. > > I can of course extend thermal_cooling_device_register() to accept a fwnode_handle that > can be used for both ACPI and OF based cooling device identification, if this is what you > prefer. I'm not sure about this ATM and see below. > This patch series would then turn into a cleanup series, focusing on properly adding > thermal zone devices and cooling devices into the global device hierarchy. I'd prefer to do one thing at a time though. If you want cooling devices to get parents, fine. I'm not fundamentally opposed to that idea, but let's have clear rules for device drivers on how to set those parents for the sake of consistency. As for the ACPI case, one rule that I want to be followed (as already stated multiple times) is that a struct acpi_device can only be a parent of another struct acpi_device. This means that the parent of a cooling device needs to be a platform device or similar representing the actual device that will be used for implementing the cooling. A separate question is how acpi_thermal_should_bind_cdev() will match cooling devices with the ACPI handles coming from _PSL, _ALx, _TZD etc. and the rule can be that it will look at the ACPI_COMPANION() of the parent of the given cooling device. > >> we must use a more generic approach. > > I'm not sure what use case you are talking about. > > > > Surely, devices with no representation in the ACPI namespace cannot be > > bound to ACPI thermal zones. For devices that have a representation > > in the ACPI namespace, storing an ACPI handle in devdata should not be > > a problem. > > See my above explanations for details, drvdata is defined to hold device private data, > nothing more. This is related to the discussion below. > >> I was thinking about using the acpi_handle of the parent device instead of messing > >> with the drvdata field, but this only works if the parent device pointer of the > >> cooling device is populated. > >> > >> (Cooling devices without a parent device would then be ignored by the ACPI thermal > >> zone driver, as such cooling devices cannot be linked to ACPI). > > It can be arranged this way, but what's the practical difference? > > Anyone who creates a struct thermal_cooling_device and can set its > > parent pointer to a device with an ACPI companion, may as well set its > > devdata to point to that companion directly - or to its ACPI handle if > > that's preferred. > > Yes, but this would require explicit support for ACPI in every driver that registers cooling devices. So you want to have generic drivers that may work on ACPI platforms and on DT platforms to be able to create cooling devices for use with ACPI thermal zones. Well, had you started the whole discussion with this statement, it would have been much easier to understand your point. > Using the parent device to retrieve the acpi_handle or allowing all drivers to just submit a fwnode_handle > of their choice when creating a cooling device will fix this. If you go the parents route, this is an important consideration for the rules on how to set those parents. Namely, they would need to be set so that the fwnode_handle of the parent could be used for binding the cooling device to a thermal zone either on ACPI or on DT systems. Of course, there are also cooling devices whose parents will not have an fwnode_handle and they would still need to work in this brave new world. From rafael at kernel.org Thu Nov 27 14:14:27 2025 From: rafael at kernel.org (Rafael J. Wysocki) Date: Thu, 27 Nov 2025 23:14:27 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Message-ID: On Thu, Nov 27, 2025 at 9:29?PM Armin Wolf wrote: > > Am 27.11.25 um 19:22 schrieb Rafael J. Wysocki: > > > On Sat, Nov 22, 2025 at 3:18?PM Armin Wolf wrote: > >> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: > >> > >>> On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: > > [...] > > > >>>> --- > >>>> Armin Wolf (8): > >>>> thermal: core: Allow setting the parent device of cooling devices > >>>> thermal: core: Set parent device in thermal_of_cooling_device_register() > >>>> ACPI: processor: Stop creating "device" sysfs link > >>> That link is not to the cooling devices' parent, but to the ACPI > >>> device object (a struct acpi_device) that corresponds to the parent. > >>> The parent of the cooling device should be the processor device, not > >>> its ACPI companion, so I'm not sure why there would be a conflict. > >> From the perspective of the Linux device core, a parent device does not have to be > >> a "physical" device. In the case of the ACPI processor driver, the ACPI device is used, > >> so the cooling device registered by said driver belongs to the ACPI device. > > Well, that's a problem. A struct acpi_device should not be a parent > > of anything other than a struct acpi_device. > > Understandable, in this case we should indeed use the the CPU device, especially since the fwnode > associated with it already points to the correct ACPI processor object (at least on my machine). > > >> I agree that using the Linux processor device would make more sense, but this will require > >> changes inside the ACPI processor driver. > > So be it. > > OK. > > >> As for the "device" symlink: The conflict would be a naming conflict, as both "device" symlinks > >> (the one created by the ACPI processor driver and the one created by the device core) will > >> be created in the same directory (which is the directory of the cooling device). > > I see. > > > > But why is the new symlink needed in the first place? If the device > > has a parent, it will appear under that parent in /sys/devices/, won't > > it? > > > > Currently, all of the thermal class devices appear under > > /sys/devices/virtual/thermal/ because they have no parents and they > > all get a class parent kobject under /sys/devices/virtual/, as that's > > what get_device_parent() does. > > > > If they have real parents, they will appear under those parents, so > > why will the parents need to be pointed to additionally? > > The "device" smylink is a comfort feature provided by the device core itself to allow user space > application to traverse the device tree from bottom to top, like a double-linked list. We cannot > disable the creation of this symlink, nor should we. I think you mean device_add_class_symlinks(), but that's just for class devices. Of course, thermal devices are class devices, so they'll get those links if they get parents. Fair enough. > > BTW, this means that the layout of /sys/devices/ will change when > > thermal devices get real parents. I'm not sure if this is a problem, > > but certainly something to note. > > I know, most applications likely use /sys/class/thermal/, so they are not impacted by this. I will > note this in the cover letter of the next revision. > > >>>> ACPI: fan: Stop creating "device" sysfs link > >>>> ACPI: video: Stop creating "device" sysfs link > >>> Analogously in the above two cases AFAICS. > >>> > >>> The parent of a cooling device should be a "physical" device object, > >>> like a platform device or a PCI device or similar, not a struct > >>> acpi_device (which in fact is not a device even). > >> From the perspective of the Linux device core, a ACPI device is a perfectly valid device. > > The driver core is irrelevant here. > > > > As I said before, a struct acpi_device object should not be a parent > > of anything other than a struct acpi_device object. Those things are > > not devices and they cannot be used for representing PM dependencies, > > for example. > > > >> I agree that using a platform device or PCI device is better, but this already happens > >> inside the ACPI fan driver (platform device). > > So it should not happen there. > > I meant that the ACPI fan driver already uses the platform device as the parent device of the > cooling device, so the ACPI device is only used for interacting with the ACPI control methods > (and registering sysfs attributes i think). OK > >> Only the ACPI video driver created a "device" sysfs link that points to the ACPI device > >> instead of the PCI device. I just noticed that i accidentally changed this by using the > >> PCI device as the parent device for the cooling device. > >> > >> If you want then we can keep this change. > > The PCI device should be its parent. > > Alright, i will note this in the patch description. > > >>>> thermal: core: Set parent device in thermal_cooling_device_register() > >>>> ACPI: thermal: Stop creating "device" sysfs link > >>> And this link is to the struct acpi_device representing the thermal zone itself. > >> Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to > >> ACPI devices. Because of this all (thermal zone) devices created by an instance of > >> said driver are descendants of the ACPI device said instance is bound to. > >> > >> We can of course convert the ACPI thermal zone driver into a platform driver, but > >> this would be a separate patch series. > > If you want parents, this needs to be done first, but I'm still not > > sure what the parent of a thermal zone would represent. > > > > In the ACPI case it is kind of easy - it would be the (platform) > > device corresponding to a given ThermalZone object in the ACPI > > namespace - but it only has a practical meaning if that device has a > > specific parent. For example, if the corresponding ThermalZone object > > is present in the \_SB scope, the presence of the thermal zone parent > > won't provide any additional information. > > To the device core it will, as the platform device will need to be suspended > after the thermal zone device has been suspended, among other things. Let's set suspend aside for now, I think I've explained my viewpoint on this enough elsewhere. > > Unfortunately, the language in the specification isn't particularly > > helpful here: "Thermal zone objects should appear in the namespace > > under the portion of the system that comprises the thermal zone. For > > example, a thermal zone that is isolated to a docking station should > > be defined within the scope of the docking station device." To me > > "the portion of the system" is not too meaningful unless it is just > > one device without children. That's why _TZD has been added AFAICS. > > I think you are confusing the parent device of the ThermalZone ACPI device > with the parent device of the struct thermal_zone_device. No, I'm not. > I begin to wonder if mentioning the ACPI ThermalZone device together with the > struct thermal_zone_device was a bad idea on my side xd. Maybe. > >>>> thermal: core: Allow setting the parent device of thermal zone devices > >>> I'm not sure if this is a good idea, at least until it is clear what > >>> the role of a thermal zone parent device should be. > >> Take a look at my explanation with the Intel Wifi driver. > > I did and I think that you want the parent to be a device somehow > > associated with the thermal zone, but how exactly? What should that > > be in the Wifi driver case, the PCI device or something else? > > > > And what if the thermal zone affects multiple devices? Which of them > > (if any) would be its parent? And would it be consistent with the > > ACPI case described above? > > > > All of that needs consideration IMV. > > I agree, but there is a difference between "this struct thermal_zone_device depends on > device X to be operational" and "this thermal zone affects device X, device Y and device Z". Yes, there is. > This patch series exclusively deals with telling the driver core that "this struct thermal_zone_device > depends on device X to be operational". Maybe let's take care of cooling devices first and get back to this later? From W_Armin at gmx.de Thu Nov 27 15:49:47 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Fri, 28 Nov 2025 00:49:47 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Message-ID: Am 27.11.25 um 22:46 schrieb Rafael J. Wysocki: > On Thu, Nov 27, 2025 at 9:06?PM Armin Wolf wrote: >> Am 27.11.25 um 18:41 schrieb Rafael J. Wysocki: >> >>> On Sat, Nov 22, 2025 at 3:18?PM Armin Wolf wrote: >>>> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: >>>> >>>>> On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: >>>>>> Drivers registering thermal zone/cooling devices are currently unable >>>>>> to tell the thermal core what parent device the new thermal zone/ >>>>>> cooling device should have, potentially causing issues with suspend >>>>>> ordering >>>>> This is one potential class of problems that may arise, but I would >>>>> like to see a real example of this. >>>>> >>>>> As it stands today, thermal_class has no PM callbacks, so there are no >>>>> callback execution ordering issues with devices in that class and what >>>>> other suspend/resume ordering issues are there? >>>> Correct, that is why i said "potentially". >>>> >>>>> Also, the suspend and resume of thermal zones is handled via PM >>>>> notifiers. Is there a problem with this? >>>> The problem with PM notifiers is that thermal zones stop working even before >>>> user space is frozen. Freezing user space might take a lot of time, so having >>>> no thermal management during this period is less than ideal. >>> This can be addressed by doing thermal zone suspend after freezing >>> tasks and before starting to suspend devices. Accordingly, thermal >>> zones could be resumed after resuming devices and before thawing >>> tasks. That should not be an overly complex change to make. >> AFAIK this is only possible by using dev_pm_ops, > Of course it is not the case. > > For example, thermal_pm_notify_prepare() could be called directly from > dpm_prepare() and thermal_pm_notify_complete() could be called > directly from dpm_complete() (which would require switching over > thermal to a non-freezable workqueue). > >> the PM notifier is triggered before tasks are frozen during suspend and after they are thawed during resume. > I know that. > >> Using dev_pm_ops would also ensure that thermal zone devices are resumed after their >> parent devices, so no additional changes inside the pm core would be needed. > Not really. thermal_pm_suspended needs to be set and cleared from somewhere. thermal_pm_suspended is only used for initializing the state of thermal zone devices registered during a suspend transition. This is currently needed because user space tasks are still operational when the PM notifier callback is called, so we have to be prepared for new thermal zone devices being registered in the middle of a suspend transition. When using dev_pm_ops, new thermal zone devices cannot appear in the middle of a suspend transition, as this would violate the restraints of the device core regarding device registrations. Because of this thermal_pm_suspended can be removed once we use dev_pm_ops. >>>> This problem would not occur when using dev_pm_ops, as thermal zones would be >>>> suspended after user space has been frozen successfully. Additionally, when using >>>> dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates >>>> that no new devices (including thermal zones and cooling devices) be registered during >>>> a suspend/resume cycle. >>>> >>>> Replacing the PM notifiers with dev_pm_ops would of course be a optimization with >>>> its own patch series. >>> Honestly, I don't see much benefit from using dev_pm_ops for thermal >>> zone devices and cooling devices. Moreover, I actually think that >>> they could be "no PM" devices that are not even put on the >>> suspend-resume device list. Technically, they are just interfaces on >>> top of some other devices allowing the user space to interact with the >>> latter and combining different pieces described by the platform >>> firmware. They by themselves have no PM capabilities. >> Correct, thermal zone devices are virtual devices representing thermal management >> aspects of the underlying parent device. This however does not mean that thermal zone >> devices have no PM capabilities, because they contain state. Some part of this state >> (namely TZ_STATE_FLAG_SUSPENDED and TZ_STATE_FLAG_RESUMING) is affected by power management, >> so we should tell the device core about this by using dev_pm_ops instead of the PM notifier. > Changing the zone state to anything different from TZ_STATE_READY > causes __thermal_zone_device_update() to do nothing and this is the > whole "suspend". It does not need to be done from a PM callback and I > see no reason why doing it from a PM callback would be desirable. > Sorry. > > Apart from the above, TZ_STATE_FLAG_SUSPENDED and > TZ_STATE_FLAG_RESUMING are only used for coordination between > thermal_zone_pm_prepare(), thermal_zone_device_resume() and > thermal_zone_pm_complete(), so this is not a state anything other then > the specific thermal zone in question cares about. AFAIK this is not completely true, once TZ_STATE_FLAG_SUSPENDED is set, __thermal_zone_device_update() will stop polling said device (as you said). This is not only important for the thermal zone device itself, but also for the underlying device driver as he has to make sure that the thermal zone callbacks do not access an already suspended hardware device. > Moreover, resuming a thermal zone before resuming any cooling devices > bound to it would almost certainly break things and I'm not sure how > you would make that work with dev_pm_ops. BTW, using device links for > this is not an option as far as I'm concerned. We could simply resume the thermal zones inside the .complete callback. The cooling devices will already be operational when said complete callback is being called by the PM core, due to the resume phase having been completed already. >>>>>> and making it impossible for user space applications to >>>>>> associate a given thermal zone device with its parent device. >>>>> Why does user space need to know the parent of a given cooling device >>>>> or thermal zone? >>>> Lets say that we have two thermal zones registered by two instances of the >>>> Intel Wifi driver. User space is currently unable to find out which thermal zone >>>> belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]"). >>> But the "belong" part is not quite well defined here. I think that >>> what user space needs to know is what devices are located in a given >>> thermal zone, isn't it? Knowing the parent doesn't necessarily >>> address this. >> The device exposing a given thermal zone device is not always a member of the thermal zone itself. >> In case of the Intel Wifi adapters, the individual Wifi adapters are indeed members of the thermal zone >> associated with their thermal zone device. But thermal zones created thru a system management controller >> for example might only cover devices like the CPUs and GPUs, not the system management controller device itself. > Well, exactly. > >> The parent device of a child device is the upstream device of the child device. The connection between parent >> and child can be physical (SMBus controller (parent) -> i2c device (child)) or purely logical >> (PCI device (parent) -> thermal zone device (child)). There exists a parent-child dependency between a parent >> and a child device (the child device cannot function without its parent being operational), and user space >> might want to be able to discover such dependencies. > But this needs to be consistent. > > If the parent of one thermal zone represents the device affected by it > and the parent of another thermal zone represents something else, user > space will need platform-specific knowledge to figure this out, which > is the case today. Without consistency, this is just not useful. I think there is a misunderstanding here, describing the devices affected by a given thermal zone has nothing to do with the parent-child dependency between a thermal zone device and its parent device. This parent-child dependency only states that: "This thermal zone device is descended from this parent device. It might thus depend on said parent device to be operational." >>>> This problem would be solved once we populate the parent device pointer inside the thermal zone >>>> device, as user space can simply look at the "device" symlink to determine the parent device behind >>>> a given thermal zone device. >>> I'm not convinced about this. >>> >>>> Additionally, being able to access the acpi_handle of the parent device will be necessary for the >>>> ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors. >>> I guess by the "parent" you mean the device represented in the ACPI >>> namespace by a ThermalZone object, right? But this is not the same as >>> the "parent" in the Wifi driver context, is it? >> In the context of a ACPI ThermalZone, the parent device of the thermal cooling device would currently >> be the ACPI device bound to the "thermal" ACPI driver. In the context of the Intel Wifi card, the parent >> device would be PCI device bound to the corresponding Intel Wifi driver. >> >> I think you misunderstood what kind of parent device i was referring to. You likely though that i was referring >> to the parent device of the ACPI ThermalZone, right? > No. I thought that you were referring to the ACPI ThermalZone itself. > Or rather, a platform device associated with the ACPI ThermalZone > (that is, the device the ACPI ThermalZone in the ACPI_COMPAION() of). That is correct. >> That however is not the case , with "parent device" i was >> referring to the device responsible for creating a given struct thermal_zone_device instance. > So I was not confused. > >>>>>> This patch series aims to fix this issue by extending the functions >>>>>> used to register thermal zone/cooling devices to also accept a parent >>>>>> device pointer. The first six patches convert all functions used for >>>>>> registering cooling devices, while the functions used for registering >>>>>> thermal zone devices are converted by the remaining two patches. >>>>>> >>>>>> I tested this series on various devices containing (among others): >>>>>> - ACPI thermal zones >>>>>> - ACPI processor devices >>>>>> - PCIe cooling devices >>>>>> - Intel Wifi card >>>>>> - Intel powerclamp >>>>>> - Intel TCC cooling >>>>> What exactly did you do to test it? >>>> I tested: >>>> - the thermal zone temperature readout >>>> - correctness of the new sysfs links >>>> - suspend/resume >>>> >>>> I also verified that ACPI thermal zones still bind with the ACPI fans. >>> I see, thanks. >>> >>>>>> I also compile-tested the remaining affected drivers, however i would >>>>>> still be happy if the relevant maintainers (especially those of the >>>>>> mellanox ethernet switch driver) could take a quick glance at the >>>>>> code and verify that i am using the correct device as the parent >>>>>> device. >>>>> I think that the above paragraph is not relevant any more? >>>> You are right, however i originally meant to CC the mellanox maintainers as >>>> i was a bit unsure about the changes i made to their driver. I will rework >>>> this section in the next revision and CC the mellanox maintainers. >>>> >>>>>> This work is also necessary for extending the ACPI thermal zone driver >>>>>> to support the _TZD ACPI object in the future. >>>>> I'm still unsure why _TZD support requires the ability to set a >>>>> thermal zone parent device. >>>> _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans >>>> and ACPI processors, like ACPI batteries. >>> No, it is not for cooling devices if my reading of the specification >>> is correct. It says: >>> >>> "_TZD (Thermal Zone Devices) >>> >>> This optional object evaluates to a package of device names. Each name >>> corresponds to a device in the ACPI namespace that is associated with >>> the thermal zone. The temperature reported by the thermal zone is >>> roughly correspondent to that of each of the devices." >>> >>> And then >>> >>> "The list of devices returned by the control method need not be a >>> complete and absolute list of devices affected by the thermal zone. >>> However, the package should at least contain the devices that would >>> uniquely identify where this thermal zone is located in the machine. >>> For example, a thermal zone in a docking station should include a >>> device in the docking station, a thermal zone for the CD-ROM bay, >>> should include the CD-ROM." >>> >>> So IIUC this is a list of devices allowing the location of the thermal >>> zone to be figured out. There's nothing about cooling in this >>> definition. >> Using _TZD to figure out the location of a given thermal zone is another usage >> of this ACPI control method, but lets take a look at section 11.6: >> >> - If _PSV is defined then either the _PSL or _TZD objects must exist. The _PSL and _TZD objects may both exist. >> - If _PSV is defined and _PSL is not defined then at least one device in thermal zone, as indicated by either the >> _TZD device list or devices? _TZM objects, must support device performance states. >> >> So according to my understanding, _TZD can also be used to discover additional cooling devices used for passive cooling. > But it doesn't actually say how those "device performance states" are > supposed to be used for cooling, does it? Well, ACPI specifies how passive cooling should be done using percentage values between 0% and 100%, so this part is actually specified. >> This makes sense as _PSL is defined to only contain processor objects (see section 11.4.10), so _TZD can act like an >> extension of _PSL for things like ACPI control method batteries (see 10.2.2.12). > But not everything in _TZD needs to be a potential "cooling device" > and how you'll decide which one is? Devices in _TZD that have no cooling capability will simply never register any cooling devices. This means that the .should_bind callback of the ACPI thermal zone will never see those devices. Only devices in _TZD that also have the ability for (passive) cooling will register a cooling device, so only those devices will end up with the .should_bind callback of the ACPI thermal zone. The ACPI thermal zone treats _TZD as a list of ACPI handles. If some of those handles are unused, then this is totally fine. >> Microsoft also follows this approach (see https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide >> section "Thermally managed devices" paragraph "Processor aggregator"). > Interesting. > > I agree that it would make sense to follow them because there will be > platform dependencies on that, if there aren't already. My primary goal is to improve the Linux thermal subsystem to be as powerful as the Windows thermal subsystem. This means that we must stop viewing _PSL, _ALx and _TZD as something that only works with a predefined set of devices. Instead we must view _PSL, _ALx and _TZD as something similar to the cooling-maps used for connecting thermal zones and cooling devices on OF-based systems. >>>> This however will currently not work as >>>> the ACPI thermal zone driver uses the private drvdata of the cooling device to >>>> determine if said cooling device should bind. This only works for ACPI fans and >>>> processors due to the fact that those drivers store a ACPI device pointer inside >>>> drvdata, something the ACPI thermal zone expects. >>> I'm not sure I understand the above. >>> >>> There is a list of ACPI device handles per trip point, as returned by >>> either _PSL or _ALx. Devices whose handles are in that list will be >>> bound to the thermal zone, so long as there are struct acpi_device >>> objects representing them which is verified with the help of the >>> devdata field in struct thermal_cooling_device. >> AFAIK devdata is meant to be used by the thermal zone device callbacks to access the state >> container struct of the associated device driver instance. Assuming that a given device driver >> will populate devdata with a pointer to is ACPI companion device is an implementation-specific >> detail that does not apply to all cooling device implementations. It just so happens that the >> ACPI processor and fan driver do this, likely because they where designed specifically to work >> with the ACPI thermal zone driver. >> >> The documentation of thermal_cooling_device_register() even describes devdata as "device private data", so any meaning of devdata purely depends on the >> given device driver. > Yes, and these particular drivers decide to store a pointer to struct > acpi_device in it. > > But this is not super important, they might as well set the > ACPI_COMPANION() of the cooling device to the corresponding struct > acpi_device and the ACPI thermal driver might use that information. > > I'm not opposed to using parents for this purpose, but it doesn't > change the big picture that the ACPI thermal driver will need to know > the ACPI handle corresponding to each cooling device. > > If you want to use _TZD instead of or in addition to _PSL for this, it > doesn't change much here, it's just another list of ACPI handles, so > saying that parents are needed for supporting this is not exactly > accurate IMV. My idea was something like this: /* Cooling devices without a parent device cannot be referenced using ACPI */ if (!cdev->device.parent) return false; /* Not all devices are described inside the ACPI tables */ acpi_handle cdev_handle = ACPI_HANDLE(cdev->device.parent); if (!cdev_handle) return false; for (i = 0; i < acpi_trip->devices.count; i++) { acpi_handle handle = acpi_trip->devices.handles[i]; if (handle == cdev_handle) return true; } This only works if the parent device pointer of the cooling device is populated. >>> IOW, cooling device drivers that create struct thermal_cooling_device >>> objects representing them are expected to set devdata in those objects >>> to point to struct acpi_device objects corresponding to their ACPI >>> handles, but in principle acpi_thermal_should_bind_cdev() might as >>> well just use the handles themselves. It just needs to know that >>> there is a cooling driver on the other side of the ACPI handle. >>> >>> The point is that a cooling device to be bound to an ACPI thermal zone >>> needs an ACPI handle in the first place to be listed in _PSL or _ALx. >> Correct, i merely change the way the ACPI thermal zone driver retrieves the >> ACPI handle associated with a given cooling device. > Right. > >>>> As we cannot require all cooling devices to store an ACPI device pointer inside >>>> their drvdata field in order to support ACPI, >>> Cooling devices don't store ACPI device pointers in struct >>> thermal_cooling_device objects, ACPI cooling drivers do, and there are >>> two reasons to do that: (1) to associate a given struct >>> thermal_cooling_device with an ACPI handle and (2) to let >>> acpi_thermal_should_bind_cdev() know that the cooling device is >>> present and functional. >>> >>> This can be changed to store an ACPI handle in struct >>> thermal_cooling_device and acpi_thermal_should_bind_cdev() may just >>> verify that the device is there by itself. >> I can of course extend thermal_cooling_device_register() to accept a fwnode_handle that >> can be used for both ACPI and OF based cooling device identification, if this is what you >> prefer. > I'm not sure about this ATM and see below. > >> This patch series would then turn into a cleanup series, focusing on properly adding >> thermal zone devices and cooling devices into the global device hierarchy. > I'd prefer to do one thing at a time though. > > If you want cooling devices to get parents, fine. I'm not > fundamentally opposed to that idea, but let's have clear rules for > device drivers on how to set those parents for the sake of > consistency. > > As for the ACPI case, one rule that I want to be followed (as already > stated multiple times) is that a struct acpi_device can only be a > parent of another struct acpi_device. This means that the parent of a > cooling device needs to be a platform device or similar representing > the actual device that will be used for implementing the cooling. OK. > A separate question is how acpi_thermal_should_bind_cdev() will match > cooling devices with the ACPI handles coming from _PSL, _ALx, _TZD > etc. and the rule can be that it will look at the ACPI_COMPANION() of > the parent of the given cooling device. See the example code i pasted above, the whole matching is done using ACPI handles, so we can completely leave ACPI_COMPANION() out of this. >>>> we must use a more generic approach. >>> I'm not sure what use case you are talking about. >>> >>> Surely, devices with no representation in the ACPI namespace cannot be >>> bound to ACPI thermal zones. For devices that have a representation >>> in the ACPI namespace, storing an ACPI handle in devdata should not be >>> a problem. >> See my above explanations for details, drvdata is defined to hold device private data, >> nothing more. > This is related to the discussion below. > >>>> I was thinking about using the acpi_handle of the parent device instead of messing >>>> with the drvdata field, but this only works if the parent device pointer of the >>>> cooling device is populated. >>>> >>>> (Cooling devices without a parent device would then be ignored by the ACPI thermal >>>> zone driver, as such cooling devices cannot be linked to ACPI). >>> It can be arranged this way, but what's the practical difference? >>> Anyone who creates a struct thermal_cooling_device and can set its >>> parent pointer to a device with an ACPI companion, may as well set its >>> devdata to point to that companion directly - or to its ACPI handle if >>> that's preferred. >> Yes, but this would require explicit support for ACPI in every driver that registers cooling devices. > So you want to have generic drivers that may work on ACPI platforms > and on DT platforms to be able to create cooling devices for use with > ACPI thermal zones. Well, had you started the whole discussion with > this statement, it would have been much easier to understand your > point. Sorry for the messy discussion, i intended to have two separate patch series. This one was meant to simply be a preparation, with the important changes inside the ACPI thermal zone driver being implemented with the second patch series. That was also the reason why i send this series as an RFC. >> Using the parent device to retrieve the acpi_handle or allowing all drivers to just submit a fwnode_handle >> of their choice when creating a cooling device will fix this. > If you go the parents route, this is an important consideration for > the rules on how to set those parents. Namely, they would need to be > set so that the fwnode_handle of the parent could be used for binding > the cooling device to a thermal zone either on ACPI or on DT systems. > > Of course, there are also cooling devices whose parents will not have > an fwnode_handle and they would still need to work in this brave new > world. > True, i did not think of that. In this case extending thermal_of_cooling_device_register() and friends to accept a generic fwnode_handle instead of a OF-specific device_node would make more sense. Most drivers can simply pass the result of dev_fwnode() instead of dev->of_node, only those that support multiple cooling device child nodes would need additional work to also support ACPI. Basically, thermal_of_get_cooling_spec() could handle the fwnode_handle in the following manner: if (cooling_spec.np->fwnode != cdev->fwnode) return false; And the ACPI thermal zone driver could then simply use ACPI_HANDLE_FWNODE() to retrieve the ACPI handle from the fwnode_handle (together with a NULL check of course). If you are OK with this approach, i will forget about the whole parent device stuff for now and focus on extending (devm_)thermal_of_cooling_device_register(). There are some additional changes needed for reliably associating cooling devices to ACPI trip points using fwnode handles, but those are not that intrusive. What do you think? Thanks, Armin Wolf From W_Armin at gmx.de Thu Nov 27 15:52:47 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Fri, 28 Nov 2025 00:52:47 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Message-ID: Am 27.11.25 um 23:14 schrieb Rafael J. Wysocki: > On Thu, Nov 27, 2025 at 9:29?PM Armin Wolf wrote: >> Am 27.11.25 um 19:22 schrieb Rafael J. Wysocki: >> >>> On Sat, Nov 22, 2025 at 3:18?PM Armin Wolf wrote: >>>> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: >>>> >>>>> On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: >>> [...] >>> >>>>>> --- >>>>>> Armin Wolf (8): >>>>>> thermal: core: Allow setting the parent device of cooling devices >>>>>> thermal: core: Set parent device in thermal_of_cooling_device_register() >>>>>> ACPI: processor: Stop creating "device" sysfs link >>>>> That link is not to the cooling devices' parent, but to the ACPI >>>>> device object (a struct acpi_device) that corresponds to the parent. >>>>> The parent of the cooling device should be the processor device, not >>>>> its ACPI companion, so I'm not sure why there would be a conflict. >>>> From the perspective of the Linux device core, a parent device does not have to be >>>> a "physical" device. In the case of the ACPI processor driver, the ACPI device is used, >>>> so the cooling device registered by said driver belongs to the ACPI device. >>> Well, that's a problem. A struct acpi_device should not be a parent >>> of anything other than a struct acpi_device. >> Understandable, in this case we should indeed use the the CPU device, especially since the fwnode >> associated with it already points to the correct ACPI processor object (at least on my machine). >> >>>> I agree that using the Linux processor device would make more sense, but this will require >>>> changes inside the ACPI processor driver. >>> So be it. >> OK. >> >>>> As for the "device" symlink: The conflict would be a naming conflict, as both "device" symlinks >>>> (the one created by the ACPI processor driver and the one created by the device core) will >>>> be created in the same directory (which is the directory of the cooling device). >>> I see. >>> >>> But why is the new symlink needed in the first place? If the device >>> has a parent, it will appear under that parent in /sys/devices/, won't >>> it? >>> >>> Currently, all of the thermal class devices appear under >>> /sys/devices/virtual/thermal/ because they have no parents and they >>> all get a class parent kobject under /sys/devices/virtual/, as that's >>> what get_device_parent() does. >>> >>> If they have real parents, they will appear under those parents, so >>> why will the parents need to be pointed to additionally? >> The "device" smylink is a comfort feature provided by the device core itself to allow user space >> application to traverse the device tree from bottom to top, like a double-linked list. We cannot >> disable the creation of this symlink, nor should we. > I think you mean device_add_class_symlinks(), but that's just for > class devices. Of course, thermal devices are class devices, so > they'll get those links if they get parents. Fair enough. > >>> BTW, this means that the layout of /sys/devices/ will change when >>> thermal devices get real parents. I'm not sure if this is a problem, >>> but certainly something to note. >> I know, most applications likely use /sys/class/thermal/, so they are not impacted by this. I will >> note this in the cover letter of the next revision. >> >>>>>> ACPI: fan: Stop creating "device" sysfs link >>>>>> ACPI: video: Stop creating "device" sysfs link >>>>> Analogously in the above two cases AFAICS. >>>>> >>>>> The parent of a cooling device should be a "physical" device object, >>>>> like a platform device or a PCI device or similar, not a struct >>>>> acpi_device (which in fact is not a device even). >>>> From the perspective of the Linux device core, a ACPI device is a perfectly valid device. >>> The driver core is irrelevant here. >>> >>> As I said before, a struct acpi_device object should not be a parent >>> of anything other than a struct acpi_device object. Those things are >>> not devices and they cannot be used for representing PM dependencies, >>> for example. >>> >>>> I agree that using a platform device or PCI device is better, but this already happens >>>> inside the ACPI fan driver (platform device). >>> So it should not happen there. >> I meant that the ACPI fan driver already uses the platform device as the parent device of the >> cooling device, so the ACPI device is only used for interacting with the ACPI control methods >> (and registering sysfs attributes i think). > OK > >>>> Only the ACPI video driver created a "device" sysfs link that points to the ACPI device >>>> instead of the PCI device. I just noticed that i accidentally changed this by using the >>>> PCI device as the parent device for the cooling device. >>>> >>>> If you want then we can keep this change. >>> The PCI device should be its parent. >> Alright, i will note this in the patch description. >> >>>>>> thermal: core: Set parent device in thermal_cooling_device_register() >>>>>> ACPI: thermal: Stop creating "device" sysfs link >>>>> And this link is to the struct acpi_device representing the thermal zone itself. >>>> Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to >>>> ACPI devices. Because of this all (thermal zone) devices created by an instance of >>>> said driver are descendants of the ACPI device said instance is bound to. >>>> >>>> We can of course convert the ACPI thermal zone driver into a platform driver, but >>>> this would be a separate patch series. >>> If you want parents, this needs to be done first, but I'm still not >>> sure what the parent of a thermal zone would represent. >>> >>> In the ACPI case it is kind of easy - it would be the (platform) >>> device corresponding to a given ThermalZone object in the ACPI >>> namespace - but it only has a practical meaning if that device has a >>> specific parent. For example, if the corresponding ThermalZone object >>> is present in the \_SB scope, the presence of the thermal zone parent >>> won't provide any additional information. >> To the device core it will, as the platform device will need to be suspended >> after the thermal zone device has been suspended, among other things. > Let's set suspend aside for now, I think I've explained my viewpoint > on this enough elsewhere. > Agreed. >>> Unfortunately, the language in the specification isn't particularly >>> helpful here: "Thermal zone objects should appear in the namespace >>> under the portion of the system that comprises the thermal zone. For >>> example, a thermal zone that is isolated to a docking station should >>> be defined within the scope of the docking station device." To me >>> "the portion of the system" is not too meaningful unless it is just >>> one device without children. That's why _TZD has been added AFAICS. >> I think you are confusing the parent device of the ThermalZone ACPI device >> with the parent device of the struct thermal_zone_device. > No, I'm not. > >> I begin to wonder if mentioning the ACPI ThermalZone device together with the >> struct thermal_zone_device was a bad idea on my side xd. > Maybe. > >>>>>> thermal: core: Allow setting the parent device of thermal zone devices >>>>> I'm not sure if this is a good idea, at least until it is clear what >>>>> the role of a thermal zone parent device should be. >>>> Take a look at my explanation with the Intel Wifi driver. >>> I did and I think that you want the parent to be a device somehow >>> associated with the thermal zone, but how exactly? What should that >>> be in the Wifi driver case, the PCI device or something else? >>> >>> And what if the thermal zone affects multiple devices? Which of them >>> (if any) would be its parent? And would it be consistent with the >>> ACPI case described above? >>> >>> All of that needs consideration IMV. >> I agree, but there is a difference between "this struct thermal_zone_device depends on >> device X to be operational" and "this thermal zone affects device X, device Y and device Z". > Yes, there is. > >> This patch series exclusively deals with telling the driver core that "this struct thermal_zone_device >> depends on device X to be operational". > Maybe let's take care of cooling devices first and get back to this later? > Agreed. From rafael at kernel.org Fri Nov 28 03:40:44 2025 From: rafael at kernel.org (Rafael J. Wysocki) Date: Fri, 28 Nov 2025 12:40:44 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Message-ID: On Fri, Nov 28, 2025 at 12:50?AM Armin Wolf wrote: > > Am 27.11.25 um 22:46 schrieb Rafael J. Wysocki: > > > On Thu, Nov 27, 2025 at 9:06?PM Armin Wolf wrote: > >> Am 27.11.25 um 18:41 schrieb Rafael J. Wysocki: > >> > >>> On Sat, Nov 22, 2025 at 3:18?PM Armin Wolf wrote: > >>>> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: > >>>> > >>>>> On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: > >>>>>> Drivers registering thermal zone/cooling devices are currently unable > >>>>>> to tell the thermal core what parent device the new thermal zone/ > >>>>>> cooling device should have, potentially causing issues with suspend > >>>>>> ordering > >>>>> This is one potential class of problems that may arise, but I would > >>>>> like to see a real example of this. > >>>>> > >>>>> As it stands today, thermal_class has no PM callbacks, so there are no > >>>>> callback execution ordering issues with devices in that class and what > >>>>> other suspend/resume ordering issues are there? > >>>> Correct, that is why i said "potentially". > >>>> > >>>>> Also, the suspend and resume of thermal zones is handled via PM > >>>>> notifiers. Is there a problem with this? > >>>> The problem with PM notifiers is that thermal zones stop working even before > >>>> user space is frozen. Freezing user space might take a lot of time, so having > >>>> no thermal management during this period is less than ideal. > >>> This can be addressed by doing thermal zone suspend after freezing > >>> tasks and before starting to suspend devices. Accordingly, thermal > >>> zones could be resumed after resuming devices and before thawing > >>> tasks. That should not be an overly complex change to make. > >> AFAIK this is only possible by using dev_pm_ops, > > Of course it is not the case. > > > > For example, thermal_pm_notify_prepare() could be called directly from > > dpm_prepare() and thermal_pm_notify_complete() could be called > > directly from dpm_complete() (which would require switching over > > thermal to a non-freezable workqueue). > > > >> the PM notifier is triggered before tasks are frozen during suspend and after they are thawed during resume. > > I know that. > > > >> Using dev_pm_ops would also ensure that thermal zone devices are resumed after their > >> parent devices, so no additional changes inside the pm core would be needed. > > Not really. thermal_pm_suspended needs to be set and cleared from somewhere. > > thermal_pm_suspended is only used for initializing the state of thermal zone devices registered > during a suspend transition. This is currently needed because user space tasks are still operational > when the PM notifier callback is called, so we have to be prepared for new thermal zone devices > being registered in the middle of a suspend transition. > > When using dev_pm_ops, new thermal zone devices cannot appear in the middle of a suspend transition, > as this would violate the restraints of the device core regarding device registrations. Because of > this thermal_pm_suspended can be removed once we use dev_pm_ops. No, we are not going to use dev_pm_ops for thermal zone suspend. That would be adding complexity just for the sake of it IMV. > >>>> This problem would not occur when using dev_pm_ops, as thermal zones would be > >>>> suspended after user space has been frozen successfully. Additionally, when using > >>>> dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates > >>>> that no new devices (including thermal zones and cooling devices) be registered during > >>>> a suspend/resume cycle. > >>>> > >>>> Replacing the PM notifiers with dev_pm_ops would of course be a optimization with > >>>> its own patch series. > >>> Honestly, I don't see much benefit from using dev_pm_ops for thermal > >>> zone devices and cooling devices. Moreover, I actually think that > >>> they could be "no PM" devices that are not even put on the > >>> suspend-resume device list. Technically, they are just interfaces on > >>> top of some other devices allowing the user space to interact with the > >>> latter and combining different pieces described by the platform > >>> firmware. They by themselves have no PM capabilities. > >> Correct, thermal zone devices are virtual devices representing thermal management > >> aspects of the underlying parent device. This however does not mean that thermal zone > >> devices have no PM capabilities, because they contain state. Some part of this state > >> (namely TZ_STATE_FLAG_SUSPENDED and TZ_STATE_FLAG_RESUMING) is affected by power management, > >> so we should tell the device core about this by using dev_pm_ops instead of the PM notifier. > > Changing the zone state to anything different from TZ_STATE_READY > > causes __thermal_zone_device_update() to do nothing and this is the > > whole "suspend". It does not need to be done from a PM callback and I > > see no reason why doing it from a PM callback would be desirable. > > Sorry. > > > > Apart from the above, TZ_STATE_FLAG_SUSPENDED and > > TZ_STATE_FLAG_RESUMING are only used for coordination between > > thermal_zone_pm_prepare(), thermal_zone_device_resume() and > > thermal_zone_pm_complete(), so this is not a state anything other then > > the specific thermal zone in question cares about. > > AFAIK this is not completely true, once TZ_STATE_FLAG_SUSPENDED is set, > __thermal_zone_device_update() will stop polling said device (as you said). > This is not only important for the thermal zone device itself, but also for > the underlying device driver as he has to make sure that the thermal zone > callbacks do not access an already suspended hardware device. Which callbacks in particular do you mean? That would need to be something that is not called from either __thermal_zone_device_update() because it is going to bail out early or user space because it is frozen. So what is left? Seriously, if the only problem with the existing thermal zone suspend and resume is that they are done from a PM notifier, I don't think addressing this requires involving dev_pm_ops and it will be very hard to convince me otherwise. > > Moreover, resuming a thermal zone before resuming any cooling devices > > bound to it would almost certainly break things and I'm not sure how > > you would make that work with dev_pm_ops. BTW, using device links for > > this is not an option as far as I'm concerned. > > We could simply resume the thermal zones inside the .complete callback. > The cooling devices will already be operational when said complete callback > is being called by the PM core, due to the resume phase having been completed > already. But then it would be synchronous, wouldn't it? Or if you want to start async handling from a .complete callback then I don't see a point. > >>>>>> and making it impossible for user space applications to > >>>>>> associate a given thermal zone device with its parent device. > >>>>> Why does user space need to know the parent of a given cooling device > >>>>> or thermal zone? > >>>> Lets say that we have two thermal zones registered by two instances of the > >>>> Intel Wifi driver. User space is currently unable to find out which thermal zone > >>>> belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]"). > >>> But the "belong" part is not quite well defined here. I think that > >>> what user space needs to know is what devices are located in a given > >>> thermal zone, isn't it? Knowing the parent doesn't necessarily > >>> address this. > >> The device exposing a given thermal zone device is not always a member of the thermal zone itself. > >> In case of the Intel Wifi adapters, the individual Wifi adapters are indeed members of the thermal zone > >> associated with their thermal zone device. But thermal zones created thru a system management controller > >> for example might only cover devices like the CPUs and GPUs, not the system management controller device itself. > > Well, exactly. > > > >> The parent device of a child device is the upstream device of the child device. The connection between parent > >> and child can be physical (SMBus controller (parent) -> i2c device (child)) or purely logical > >> (PCI device (parent) -> thermal zone device (child)). There exists a parent-child dependency between a parent > >> and a child device (the child device cannot function without its parent being operational), and user space > >> might want to be able to discover such dependencies. > > But this needs to be consistent. > > > > If the parent of one thermal zone represents the device affected by it > > and the parent of another thermal zone represents something else, user > > space will need platform-specific knowledge to figure this out, which > > is the case today. Without consistency, this is just not useful. > > I think there is a misunderstanding here, describing the devices affected by a given thermal zone > has nothing to do with the parent-child dependency between a thermal zone device and its parent device. > This parent-child dependency only states that: > > "This thermal zone device is descended from this parent device. It might thus depend on > said parent device to be operational." So you are postulating that the parent of a thermal zone should be the device providing the thermal sensor or otherwise a mechanism allowing temperature to be read. That is precise enough as far as I'm concerned. > >>>> This problem would be solved once we populate the parent device pointer inside the thermal zone > >>>> device, as user space can simply look at the "device" symlink to determine the parent device behind > >>>> a given thermal zone device. > >>> I'm not convinced about this. > >>> > >>>> Additionally, being able to access the acpi_handle of the parent device will be necessary for the > >>>> ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors. > >>> I guess by the "parent" you mean the device represented in the ACPI > >>> namespace by a ThermalZone object, right? But this is not the same as > >>> the "parent" in the Wifi driver context, is it? > >> In the context of a ACPI ThermalZone, the parent device of the thermal cooling device would currently > >> be the ACPI device bound to the "thermal" ACPI driver. In the context of the Intel Wifi card, the parent > >> device would be PCI device bound to the corresponding Intel Wifi driver. > >> > >> I think you misunderstood what kind of parent device i was referring to. You likely though that i was referring > >> to the parent device of the ACPI ThermalZone, right? > > No. I thought that you were referring to the ACPI ThermalZone itself. > > Or rather, a platform device associated with the ACPI ThermalZone > > (that is, the device the ACPI ThermalZone in the ACPI_COMPAION() of). > > That is correct. > > >> That however is not the case , with "parent device" i was > >> referring to the device responsible for creating a given struct thermal_zone_device instance. > > So I was not confused. > > > >>>>>> This patch series aims to fix this issue by extending the functions > >>>>>> used to register thermal zone/cooling devices to also accept a parent > >>>>>> device pointer. The first six patches convert all functions used for > >>>>>> registering cooling devices, while the functions used for registering > >>>>>> thermal zone devices are converted by the remaining two patches. > >>>>>> > >>>>>> I tested this series on various devices containing (among others): > >>>>>> - ACPI thermal zones > >>>>>> - ACPI processor devices > >>>>>> - PCIe cooling devices > >>>>>> - Intel Wifi card > >>>>>> - Intel powerclamp > >>>>>> - Intel TCC cooling > >>>>> What exactly did you do to test it? > >>>> I tested: > >>>> - the thermal zone temperature readout > >>>> - correctness of the new sysfs links > >>>> - suspend/resume > >>>> > >>>> I also verified that ACPI thermal zones still bind with the ACPI fans. > >>> I see, thanks. > >>> > >>>>>> I also compile-tested the remaining affected drivers, however i would > >>>>>> still be happy if the relevant maintainers (especially those of the > >>>>>> mellanox ethernet switch driver) could take a quick glance at the > >>>>>> code and verify that i am using the correct device as the parent > >>>>>> device. > >>>>> I think that the above paragraph is not relevant any more? > >>>> You are right, however i originally meant to CC the mellanox maintainers as > >>>> i was a bit unsure about the changes i made to their driver. I will rework > >>>> this section in the next revision and CC the mellanox maintainers. > >>>> > >>>>>> This work is also necessary for extending the ACPI thermal zone driver > >>>>>> to support the _TZD ACPI object in the future. > >>>>> I'm still unsure why _TZD support requires the ability to set a > >>>>> thermal zone parent device. > >>>> _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans > >>>> and ACPI processors, like ACPI batteries. > >>> No, it is not for cooling devices if my reading of the specification > >>> is correct. It says: > >>> > >>> "_TZD (Thermal Zone Devices) > >>> > >>> This optional object evaluates to a package of device names. Each name > >>> corresponds to a device in the ACPI namespace that is associated with > >>> the thermal zone. The temperature reported by the thermal zone is > >>> roughly correspondent to that of each of the devices." > >>> > >>> And then > >>> > >>> "The list of devices returned by the control method need not be a > >>> complete and absolute list of devices affected by the thermal zone. > >>> However, the package should at least contain the devices that would > >>> uniquely identify where this thermal zone is located in the machine. > >>> For example, a thermal zone in a docking station should include a > >>> device in the docking station, a thermal zone for the CD-ROM bay, > >>> should include the CD-ROM." > >>> > >>> So IIUC this is a list of devices allowing the location of the thermal > >>> zone to be figured out. There's nothing about cooling in this > >>> definition. > >> Using _TZD to figure out the location of a given thermal zone is another usage > >> of this ACPI control method, but lets take a look at section 11.6: > >> > >> - If _PSV is defined then either the _PSL or _TZD objects must exist. The _PSL and _TZD objects may both exist. > >> - If _PSV is defined and _PSL is not defined then at least one device in thermal zone, as indicated by either the > >> _TZD device list or devices? _TZM objects, must support device performance states. > >> > >> So according to my understanding, _TZD can also be used to discover additional cooling devices used for passive cooling. > > But it doesn't actually say how those "device performance states" are > > supposed to be used for cooling, does it? > > Well, ACPI specifies how passive cooling should be done using percentage values between 0% and 100%, > so this part is actually specified. If you refer to Section 11.1.5, this is based on _TC1 and _TC2 and has limitations. So you are saying that Section 11.1.5 should be extended to _TZD devices. Is this also there in the MSFT document? > >> This makes sense as _PSL is defined to only contain processor objects (see section 11.4.10), so _TZD can act like an > >> extension of _PSL for things like ACPI control method batteries (see 10.2.2.12). > > But not everything in _TZD needs to be a potential "cooling device" > > and how you'll decide which one is? > > Devices in _TZD that have no cooling capability will simply never register any cooling devices. This means that > the .should_bind callback of the ACPI thermal zone will never see those devices. Only devices in _TZD that also > have the ability for (passive) cooling will register a cooling device, so only those devices will end up with > the .should_bind callback of the ACPI thermal zone. > > The ACPI thermal zone treats _TZD as a list of ACPI handles. If some of those handles are unused, then this is > totally fine. > > >> Microsoft also follows this approach (see https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide > >> section "Thermally managed devices" paragraph "Processor aggregator"). > > Interesting. > > > > I agree that it would make sense to follow them because there will be > > platform dependencies on that, if there aren't already. > > My primary goal is to improve the Linux thermal subsystem to be as powerful as > the Windows thermal subsystem. This means that we must stop viewing _PSL, _ALx and _TZD > as something that only works with a predefined set of devices. Instead we must view > _PSL, _ALx and _TZD as something similar to the cooling-maps used for connecting > thermal zones and cooling devices on OF-based systems. > > >>>> This however will currently not work as > >>>> the ACPI thermal zone driver uses the private drvdata of the cooling device to > >>>> determine if said cooling device should bind. This only works for ACPI fans and > >>>> processors due to the fact that those drivers store a ACPI device pointer inside > >>>> drvdata, something the ACPI thermal zone expects. > >>> I'm not sure I understand the above. > >>> > >>> There is a list of ACPI device handles per trip point, as returned by > >>> either _PSL or _ALx. Devices whose handles are in that list will be > >>> bound to the thermal zone, so long as there are struct acpi_device > >>> objects representing them which is verified with the help of the > >>> devdata field in struct thermal_cooling_device. > >> AFAIK devdata is meant to be used by the thermal zone device callbacks to access the state > >> container struct of the associated device driver instance. Assuming that a given device driver > >> will populate devdata with a pointer to is ACPI companion device is an implementation-specific > >> detail that does not apply to all cooling device implementations. It just so happens that the > >> ACPI processor and fan driver do this, likely because they where designed specifically to work > >> with the ACPI thermal zone driver. > >> > >> The documentation of thermal_cooling_device_register() even describes devdata as "device private data", so any meaning of devdata purely depends on the > >> given device driver. > > Yes, and these particular drivers decide to store a pointer to struct > > acpi_device in it. > > > > But this is not super important, they might as well set the > > ACPI_COMPANION() of the cooling device to the corresponding struct > > acpi_device and the ACPI thermal driver might use that information. > > > > I'm not opposed to using parents for this purpose, but it doesn't > > change the big picture that the ACPI thermal driver will need to know > > the ACPI handle corresponding to each cooling device. > > > > If you want to use _TZD instead of or in addition to _PSL for this, it > > doesn't change much here, it's just another list of ACPI handles, so > > saying that parents are needed for supporting this is not exactly > > accurate IMV. > > My idea was something like this: > > /* Cooling devices without a parent device cannot be referenced using ACPI */ > if (!cdev->device.parent) > return false; > > /* Not all devices are described inside the ACPI tables */ > acpi_handle cdev_handle = ACPI_HANDLE(cdev->device.parent); > if (!cdev_handle) > return false; > > for (i = 0; i < acpi_trip->devices.count; i++) { > acpi_handle handle = acpi_trip->devices.handles[i]; > > if (handle == cdev_handle) > return true; > } > > This only works if the parent device pointer of the cooling device is populated. Sure, but it looks reasonable to me. > >>> IOW, cooling device drivers that create struct thermal_cooling_device > >>> objects representing them are expected to set devdata in those objects > >>> to point to struct acpi_device objects corresponding to their ACPI > >>> handles, but in principle acpi_thermal_should_bind_cdev() might as > >>> well just use the handles themselves. It just needs to know that > >>> there is a cooling driver on the other side of the ACPI handle. > >>> > >>> The point is that a cooling device to be bound to an ACPI thermal zone > >>> needs an ACPI handle in the first place to be listed in _PSL or _ALx. > >> Correct, i merely change the way the ACPI thermal zone driver retrieves the > >> ACPI handle associated with a given cooling device. > > Right. > > > >>>> As we cannot require all cooling devices to store an ACPI device pointer inside > >>>> their drvdata field in order to support ACPI, > >>> Cooling devices don't store ACPI device pointers in struct > >>> thermal_cooling_device objects, ACPI cooling drivers do, and there are > >>> two reasons to do that: (1) to associate a given struct > >>> thermal_cooling_device with an ACPI handle and (2) to let > >>> acpi_thermal_should_bind_cdev() know that the cooling device is > >>> present and functional. > >>> > >>> This can be changed to store an ACPI handle in struct > >>> thermal_cooling_device and acpi_thermal_should_bind_cdev() may just > >>> verify that the device is there by itself. > >> I can of course extend thermal_cooling_device_register() to accept a fwnode_handle that > >> can be used for both ACPI and OF based cooling device identification, if this is what you > >> prefer. > > I'm not sure about this ATM and see below. > > > >> This patch series would then turn into a cleanup series, focusing on properly adding > >> thermal zone devices and cooling devices into the global device hierarchy. > > I'd prefer to do one thing at a time though. > > > > If you want cooling devices to get parents, fine. I'm not > > fundamentally opposed to that idea, but let's have clear rules for > > device drivers on how to set those parents for the sake of > > consistency. > > > > As for the ACPI case, one rule that I want to be followed (as already > > stated multiple times) is that a struct acpi_device can only be a > > parent of another struct acpi_device. This means that the parent of a > > cooling device needs to be a platform device or similar representing > > the actual device that will be used for implementing the cooling. > > OK. > > > A separate question is how acpi_thermal_should_bind_cdev() will match > > cooling devices with the ACPI handles coming from _PSL, _ALx, _TZD > > etc. and the rule can be that it will look at the ACPI_COMPANION() of > > the parent of the given cooling device. > > See the example code i pasted above, the whole matching is done using ACPI handles, > so we can completely leave ACPI_COMPANION() out of this. ACPI_HANDLE() is a wrapper around ACPI_COMPANION() so your code effectively does what I said above. > >>>> we must use a more generic approach. > >>> I'm not sure what use case you are talking about. > >>> > >>> Surely, devices with no representation in the ACPI namespace cannot be > >>> bound to ACPI thermal zones. For devices that have a representation > >>> in the ACPI namespace, storing an ACPI handle in devdata should not be > >>> a problem. > >> See my above explanations for details, drvdata is defined to hold device private data, > >> nothing more. > > This is related to the discussion below. > > > >>>> I was thinking about using the acpi_handle of the parent device instead of messing > >>>> with the drvdata field, but this only works if the parent device pointer of the > >>>> cooling device is populated. > >>>> > >>>> (Cooling devices without a parent device would then be ignored by the ACPI thermal > >>>> zone driver, as such cooling devices cannot be linked to ACPI). > >>> It can be arranged this way, but what's the practical difference? > >>> Anyone who creates a struct thermal_cooling_device and can set its > >>> parent pointer to a device with an ACPI companion, may as well set its > >>> devdata to point to that companion directly - or to its ACPI handle if > >>> that's preferred. > >> Yes, but this would require explicit support for ACPI in every driver that registers cooling devices. > > So you want to have generic drivers that may work on ACPI platforms > > and on DT platforms to be able to create cooling devices for use with > > ACPI thermal zones. Well, had you started the whole discussion with > > this statement, it would have been much easier to understand your > > point. > > Sorry for the messy discussion, i intended to have two separate patch series. This one was meant to > simply be a preparation, with the important changes inside the ACPI thermal zone driver being implemented > with the second patch series. > > That was also the reason why i send this series as an RFC. > > >> Using the parent device to retrieve the acpi_handle or allowing all drivers to just submit a fwnode_handle > >> of their choice when creating a cooling device will fix this. > > If you go the parents route, this is an important consideration for > > the rules on how to set those parents. Namely, they would need to be > > set so that the fwnode_handle of the parent could be used for binding > > the cooling device to a thermal zone either on ACPI or on DT systems. > > > > Of course, there are also cooling devices whose parents will not have > > an fwnode_handle and they would still need to work in this brave new > > world. > > > True, i did not think of that. In this case extending thermal_of_cooling_device_register() and friends to accept > a generic fwnode_handle instead of a OF-specific device_node would make more sense. Most drivers can simply > pass the result of dev_fwnode() instead of dev->of_node, only those that support multiple cooling device child > nodes would need additional work to also support ACPI. > > Basically, thermal_of_get_cooling_spec() could handle the fwnode_handle in the following manner: > > if (cooling_spec.np->fwnode != cdev->fwnode) > return false; > > And the ACPI thermal zone driver could then simply use ACPI_HANDLE_FWNODE() to retrieve the ACPI handle from > the fwnode_handle (together with a NULL check of course). > > If you are OK with this approach, i will forget about the whole parent device stuff for now and focus on extending > (devm_)thermal_of_cooling_device_register(). There are some additional changes needed for reliably associating > cooling devices to ACPI trip points using fwnode handles, but those are not that intrusive. > > What do you think? One advantage of using parents is that it will help user space to figure out connections between the abstract cooling devices and the associated hardware or firmware entities. I think that this is an important one. It also doesn't prevent fwnode_handle from being used because the fwnode_handle may just be stored in the parent. I like this more than associating fwnode_handles directly with abstract cooling devices. If the cooling device parent (that is, the provider of the cooling mechanism used by it) does not have an fwnode_handle, then either it needs to be driven directly from user space, or the driver creating a thermal zone device needs to provide a specific .should_bind() callback that will know what to look for. From W_Armin at gmx.de Sat Nov 29 03:35:58 2025 From: W_Armin at gmx.de (Armin Wolf) Date: Sat, 29 Nov 2025 12:35:58 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Message-ID: Am 28.11.25 um 12:40 schrieb Rafael J. Wysocki: > On Fri, Nov 28, 2025 at 12:50?AM Armin Wolf wrote: >> Am 27.11.25 um 22:46 schrieb Rafael J. Wysocki: >> >>> On Thu, Nov 27, 2025 at 9:06?PM Armin Wolf wrote: >>>> Am 27.11.25 um 18:41 schrieb Rafael J. Wysocki: >>>> >>>>> On Sat, Nov 22, 2025 at 3:18?PM Armin Wolf wrote: >>>>>> Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki: >>>>>> >>>>>>> On Thu, Nov 20, 2025 at 4:41?AM Armin Wolf wrote: >>>>>>>> Drivers registering thermal zone/cooling devices are currently unable >>>>>>>> to tell the thermal core what parent device the new thermal zone/ >>>>>>>> cooling device should have, potentially causing issues with suspend >>>>>>>> ordering >>>>>>> This is one potential class of problems that may arise, but I would >>>>>>> like to see a real example of this. >>>>>>> >>>>>>> As it stands today, thermal_class has no PM callbacks, so there are no >>>>>>> callback execution ordering issues with devices in that class and what >>>>>>> other suspend/resume ordering issues are there? >>>>>> Correct, that is why i said "potentially". >>>>>> >>>>>>> Also, the suspend and resume of thermal zones is handled via PM >>>>>>> notifiers. Is there a problem with this? >>>>>> The problem with PM notifiers is that thermal zones stop working even before >>>>>> user space is frozen. Freezing user space might take a lot of time, so having >>>>>> no thermal management during this period is less than ideal. >>>>> This can be addressed by doing thermal zone suspend after freezing >>>>> tasks and before starting to suspend devices. Accordingly, thermal >>>>> zones could be resumed after resuming devices and before thawing >>>>> tasks. That should not be an overly complex change to make. >>>> AFAIK this is only possible by using dev_pm_ops, >>> Of course it is not the case. >>> >>> For example, thermal_pm_notify_prepare() could be called directly from >>> dpm_prepare() and thermal_pm_notify_complete() could be called >>> directly from dpm_complete() (which would require switching over >>> thermal to a non-freezable workqueue). >>> >>>> the PM notifier is triggered before tasks are frozen during suspend and after they are thawed during resume. >>> I know that. >>> >>>> Using dev_pm_ops would also ensure that thermal zone devices are resumed after their >>>> parent devices, so no additional changes inside the pm core would be needed. >>> Not really. thermal_pm_suspended needs to be set and cleared from somewhere. >> thermal_pm_suspended is only used for initializing the state of thermal zone devices registered >> during a suspend transition. This is currently needed because user space tasks are still operational >> when the PM notifier callback is called, so we have to be prepared for new thermal zone devices >> being registered in the middle of a suspend transition. >> >> When using dev_pm_ops, new thermal zone devices cannot appear in the middle of a suspend transition, >> as this would violate the restraints of the device core regarding device registrations. Because of >> this thermal_pm_suspended can be removed once we use dev_pm_ops. > No, we are not going to use dev_pm_ops for thermal zone suspend. That > would be adding complexity just for the sake of it IMV. OK, fine. I will forget about using dev_pm_ops for the thermal subsystem. >>>>>> This problem would not occur when using dev_pm_ops, as thermal zones would be >>>>>> suspended after user space has been frozen successfully. Additionally, when using >>>>>> dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already mandates >>>>>> that no new devices (including thermal zones and cooling devices) be registered during >>>>>> a suspend/resume cycle. >>>>>> >>>>>> Replacing the PM notifiers with dev_pm_ops would of course be a optimization with >>>>>> its own patch series. >>>>> Honestly, I don't see much benefit from using dev_pm_ops for thermal >>>>> zone devices and cooling devices. Moreover, I actually think that >>>>> they could be "no PM" devices that are not even put on the >>>>> suspend-resume device list. Technically, they are just interfaces on >>>>> top of some other devices allowing the user space to interact with the >>>>> latter and combining different pieces described by the platform >>>>> firmware. They by themselves have no PM capabilities. >>>> Correct, thermal zone devices are virtual devices representing thermal management >>>> aspects of the underlying parent device. This however does not mean that thermal zone >>>> devices have no PM capabilities, because they contain state. Some part of this state >>>> (namely TZ_STATE_FLAG_SUSPENDED and TZ_STATE_FLAG_RESUMING) is affected by power management, >>>> so we should tell the device core about this by using dev_pm_ops instead of the PM notifier. >>> Changing the zone state to anything different from TZ_STATE_READY >>> causes __thermal_zone_device_update() to do nothing and this is the >>> whole "suspend". It does not need to be done from a PM callback and I >>> see no reason why doing it from a PM callback would be desirable. >>> Sorry. >>> >>> Apart from the above, TZ_STATE_FLAG_SUSPENDED and >>> TZ_STATE_FLAG_RESUMING are only used for coordination between >>> thermal_zone_pm_prepare(), thermal_zone_device_resume() and >>> thermal_zone_pm_complete(), so this is not a state anything other then >>> the specific thermal zone in question cares about. >> AFAIK this is not completely true, once TZ_STATE_FLAG_SUSPENDED is set, >> __thermal_zone_device_update() will stop polling said device (as you said). >> This is not only important for the thermal zone device itself, but also for >> the underlying device driver as he has to make sure that the thermal zone >> callbacks do not access an already suspended hardware device. > Which callbacks in particular do you mean? That would need to be > something that is not called from either > __thermal_zone_device_update() because it is going to bail out early > or user space because it is frozen. So what is left? > > Seriously, if the only problem with the existing thermal zone suspend > and resume is that they are done from a PM notifier, I don't think > addressing this requires involving dev_pm_ops and it will be very hard > to convince me otherwise. I was referring to the callbacks inside struct thermal_zone_device_ops, but those are indeed already covered by the current approach using the PM notifier. Since you are happy with the current approach, i say that we forget about the suggestion with the dev_pm_ops for now. >>> Moreover, resuming a thermal zone before resuming any cooling devices >>> bound to it would almost certainly break things and I'm not sure how >>> you would make that work with dev_pm_ops. BTW, using device links for >>> this is not an option as far as I'm concerned. >> We could simply resume the thermal zones inside the .complete callback. >> The cooling devices will already be operational when said complete callback >> is being called by the PM core, due to the resume phase having been completed >> already. > But then it would be synchronous, wouldn't it? Or if you want to > start async handling from a .complete callback then I don't see a > point. > >>>>>>>> and making it impossible for user space applications to >>>>>>>> associate a given thermal zone device with its parent device. >>>>>>> Why does user space need to know the parent of a given cooling device >>>>>>> or thermal zone? >>>>>> Lets say that we have two thermal zones registered by two instances of the >>>>>> Intel Wifi driver. User space is currently unable to find out which thermal zone >>>>>> belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string ("iwlwifi[0-X]"). >>>>> But the "belong" part is not quite well defined here. I think that >>>>> what user space needs to know is what devices are located in a given >>>>> thermal zone, isn't it? Knowing the parent doesn't necessarily >>>>> address this. >>>> The device exposing a given thermal zone device is not always a member of the thermal zone itself. >>>> In case of the Intel Wifi adapters, the individual Wifi adapters are indeed members of the thermal zone >>>> associated with their thermal zone device. But thermal zones created thru a system management controller >>>> for example might only cover devices like the CPUs and GPUs, not the system management controller device itself. >>> Well, exactly. >>> >>>> The parent device of a child device is the upstream device of the child device. The connection between parent >>>> and child can be physical (SMBus controller (parent) -> i2c device (child)) or purely logical >>>> (PCI device (parent) -> thermal zone device (child)). There exists a parent-child dependency between a parent >>>> and a child device (the child device cannot function without its parent being operational), and user space >>>> might want to be able to discover such dependencies. >>> But this needs to be consistent. >>> >>> If the parent of one thermal zone represents the device affected by it >>> and the parent of another thermal zone represents something else, user >>> space will need platform-specific knowledge to figure this out, which >>> is the case today. Without consistency, this is just not useful. >> I think there is a misunderstanding here, describing the devices affected by a given thermal zone >> has nothing to do with the parent-child dependency between a thermal zone device and its parent device. >> This parent-child dependency only states that: >> >> "This thermal zone device is descended from this parent device. It might thus depend on >> said parent device to be operational." > So you are postulating that the parent of a thermal zone should be the > device providing the thermal sensor or otherwise a mechanism allowing > temperature to be read. That is precise enough as far as I'm > concerned. Correct. >>>>>> This problem would be solved once we populate the parent device pointer inside the thermal zone >>>>>> device, as user space can simply look at the "device" symlink to determine the parent device behind >>>>>> a given thermal zone device. >>>>> I'm not convinced about this. >>>>> >>>>>> Additionally, being able to access the acpi_handle of the parent device will be necessary for the >>>>>> ACPI thermal zone driver to support cooling devices other than ACPI fans and ACPI processors. >>>>> I guess by the "parent" you mean the device represented in the ACPI >>>>> namespace by a ThermalZone object, right? But this is not the same as >>>>> the "parent" in the Wifi driver context, is it? >>>> In the context of a ACPI ThermalZone, the parent device of the thermal cooling device would currently >>>> be the ACPI device bound to the "thermal" ACPI driver. In the context of the Intel Wifi card, the parent >>>> device would be PCI device bound to the corresponding Intel Wifi driver. >>>> >>>> I think you misunderstood what kind of parent device i was referring to. You likely though that i was referring >>>> to the parent device of the ACPI ThermalZone, right? >>> No. I thought that you were referring to the ACPI ThermalZone itself. >>> Or rather, a platform device associated with the ACPI ThermalZone >>> (that is, the device the ACPI ThermalZone in the ACPI_COMPAION() of). >> That is correct. >> >>>> That however is not the case , with "parent device" i was >>>> referring to the device responsible for creating a given struct thermal_zone_device instance. >>> So I was not confused. >>> >>>>>>>> This patch series aims to fix this issue by extending the functions >>>>>>>> used to register thermal zone/cooling devices to also accept a parent >>>>>>>> device pointer. The first six patches convert all functions used for >>>>>>>> registering cooling devices, while the functions used for registering >>>>>>>> thermal zone devices are converted by the remaining two patches. >>>>>>>> >>>>>>>> I tested this series on various devices containing (among others): >>>>>>>> - ACPI thermal zones >>>>>>>> - ACPI processor devices >>>>>>>> - PCIe cooling devices >>>>>>>> - Intel Wifi card >>>>>>>> - Intel powerclamp >>>>>>>> - Intel TCC cooling >>>>>>> What exactly did you do to test it? >>>>>> I tested: >>>>>> - the thermal zone temperature readout >>>>>> - correctness of the new sysfs links >>>>>> - suspend/resume >>>>>> >>>>>> I also verified that ACPI thermal zones still bind with the ACPI fans. >>>>> I see, thanks. >>>>> >>>>>>>> I also compile-tested the remaining affected drivers, however i would >>>>>>>> still be happy if the relevant maintainers (especially those of the >>>>>>>> mellanox ethernet switch driver) could take a quick glance at the >>>>>>>> code and verify that i am using the correct device as the parent >>>>>>>> device. >>>>>>> I think that the above paragraph is not relevant any more? >>>>>> You are right, however i originally meant to CC the mellanox maintainers as >>>>>> i was a bit unsure about the changes i made to their driver. I will rework >>>>>> this section in the next revision and CC the mellanox maintainers. >>>>>> >>>>>>>> This work is also necessary for extending the ACPI thermal zone driver >>>>>>>> to support the _TZD ACPI object in the future. >>>>>>> I'm still unsure why _TZD support requires the ability to set a >>>>>>> thermal zone parent device. >>>>>> _TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI fans >>>>>> and ACPI processors, like ACPI batteries. >>>>> No, it is not for cooling devices if my reading of the specification >>>>> is correct. It says: >>>>> >>>>> "_TZD (Thermal Zone Devices) >>>>> >>>>> This optional object evaluates to a package of device names. Each name >>>>> corresponds to a device in the ACPI namespace that is associated with >>>>> the thermal zone. The temperature reported by the thermal zone is >>>>> roughly correspondent to that of each of the devices." >>>>> >>>>> And then >>>>> >>>>> "The list of devices returned by the control method need not be a >>>>> complete and absolute list of devices affected by the thermal zone. >>>>> However, the package should at least contain the devices that would >>>>> uniquely identify where this thermal zone is located in the machine. >>>>> For example, a thermal zone in a docking station should include a >>>>> device in the docking station, a thermal zone for the CD-ROM bay, >>>>> should include the CD-ROM." >>>>> >>>>> So IIUC this is a list of devices allowing the location of the thermal >>>>> zone to be figured out. There's nothing about cooling in this >>>>> definition. >>>> Using _TZD to figure out the location of a given thermal zone is another usage >>>> of this ACPI control method, but lets take a look at section 11.6: >>>> >>>> - If _PSV is defined then either the _PSL or _TZD objects must exist. The _PSL and _TZD objects may both exist. >>>> - If _PSV is defined and _PSL is not defined then at least one device in thermal zone, as indicated by either the >>>> _TZD device list or devices? _TZM objects, must support device performance states. >>>> >>>> So according to my understanding, _TZD can also be used to discover additional cooling devices used for passive cooling. >>> But it doesn't actually say how those "device performance states" are >>> supposed to be used for cooling, does it? >> Well, ACPI specifies how passive cooling should be done using percentage values between 0% and 100%, >> so this part is actually specified. > If you refer to Section 11.1.5, this is based on _TC1 and _TC2 and has > limitations. So you are saying that Section 11.1.5 should be extended > to _TZD devices. Is this also there in the MSFT document? Looking at https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide section "Thermal policy control" paragraph "Thermal manager in kernel", it seems that the NT kernel uses the passive cooling algorithm defined by the ACPI specification for all passive cooling devices. So when using Windows, _TZD is indeed treated like an extension for _PSL. >>>> This makes sense as _PSL is defined to only contain processor objects (see section 11.4.10), so _TZD can act like an >>>> extension of _PSL for things like ACPI control method batteries (see 10.2.2.12). >>> But not everything in _TZD needs to be a potential "cooling device" >>> and how you'll decide which one is? >> Devices in _TZD that have no cooling capability will simply never register any cooling devices. This means that >> the .should_bind callback of the ACPI thermal zone will never see those devices. Only devices in _TZD that also >> have the ability for (passive) cooling will register a cooling device, so only those devices will end up with >> the .should_bind callback of the ACPI thermal zone. >> >> The ACPI thermal zone treats _TZD as a list of ACPI handles. If some of those handles are unused, then this is >> totally fine. >> >>>> Microsoft also follows this approach (see https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide >>>> section "Thermally managed devices" paragraph "Processor aggregator"). >>> Interesting. >>> >>> I agree that it would make sense to follow them because there will be >>> platform dependencies on that, if there aren't already. >> My primary goal is to improve the Linux thermal subsystem to be as powerful as >> the Windows thermal subsystem. This means that we must stop viewing _PSL, _ALx and _TZD >> as something that only works with a predefined set of devices. Instead we must view >> _PSL, _ALx and _TZD as something similar to the cooling-maps used for connecting >> thermal zones and cooling devices on OF-based systems. >> >>>>>> This however will currently not work as >>>>>> the ACPI thermal zone driver uses the private drvdata of the cooling device to >>>>>> determine if said cooling device should bind. This only works for ACPI fans and >>>>>> processors due to the fact that those drivers store a ACPI device pointer inside >>>>>> drvdata, something the ACPI thermal zone expects. >>>>> I'm not sure I understand the above. >>>>> >>>>> There is a list of ACPI device handles per trip point, as returned by >>>>> either _PSL or _ALx. Devices whose handles are in that list will be >>>>> bound to the thermal zone, so long as there are struct acpi_device >>>>> objects representing them which is verified with the help of the >>>>> devdata field in struct thermal_cooling_device. >>>> AFAIK devdata is meant to be used by the thermal zone device callbacks to access the state >>>> container struct of the associated device driver instance. Assuming that a given device driver >>>> will populate devdata with a pointer to is ACPI companion device is an implementation-specific >>>> detail that does not apply to all cooling device implementations. It just so happens that the >>>> ACPI processor and fan driver do this, likely because they where designed specifically to work >>>> with the ACPI thermal zone driver. >>>> >>>> The documentation of thermal_cooling_device_register() even describes devdata as "device private data", so any meaning of devdata purely depends on the >>>> given device driver. >>> Yes, and these particular drivers decide to store a pointer to struct >>> acpi_device in it. >>> >>> But this is not super important, they might as well set the >>> ACPI_COMPANION() of the cooling device to the corresponding struct >>> acpi_device and the ACPI thermal driver might use that information. >>> >>> I'm not opposed to using parents for this purpose, but it doesn't >>> change the big picture that the ACPI thermal driver will need to know >>> the ACPI handle corresponding to each cooling device. >>> >>> If you want to use _TZD instead of or in addition to _PSL for this, it >>> doesn't change much here, it's just another list of ACPI handles, so >>> saying that parents are needed for supporting this is not exactly >>> accurate IMV. >> My idea was something like this: >> >> /* Cooling devices without a parent device cannot be referenced using ACPI */ >> if (!cdev->device.parent) >> return false; >> >> /* Not all devices are described inside the ACPI tables */ >> acpi_handle cdev_handle = ACPI_HANDLE(cdev->device.parent); >> if (!cdev_handle) >> return false; >> >> for (i = 0; i < acpi_trip->devices.count; i++) { >> acpi_handle handle = acpi_trip->devices.handles[i]; >> >> if (handle == cdev_handle) >> return true; >> } >> >> This only works if the parent device pointer of the cooling device is populated. > Sure, but it looks reasonable to me. > >>>>> IOW, cooling device drivers that create struct thermal_cooling_device >>>>> objects representing them are expected to set devdata in those objects >>>>> to point to struct acpi_device objects corresponding to their ACPI >>>>> handles, but in principle acpi_thermal_should_bind_cdev() might as >>>>> well just use the handles themselves. It just needs to know that >>>>> there is a cooling driver on the other side of the ACPI handle. >>>>> >>>>> The point is that a cooling device to be bound to an ACPI thermal zone >>>>> needs an ACPI handle in the first place to be listed in _PSL or _ALx. >>>> Correct, i merely change the way the ACPI thermal zone driver retrieves the >>>> ACPI handle associated with a given cooling device. >>> Right. >>> >>>>>> As we cannot require all cooling devices to store an ACPI device pointer inside >>>>>> their drvdata field in order to support ACPI, >>>>> Cooling devices don't store ACPI device pointers in struct >>>>> thermal_cooling_device objects, ACPI cooling drivers do, and there are >>>>> two reasons to do that: (1) to associate a given struct >>>>> thermal_cooling_device with an ACPI handle and (2) to let >>>>> acpi_thermal_should_bind_cdev() know that the cooling device is >>>>> present and functional. >>>>> >>>>> This can be changed to store an ACPI handle in struct >>>>> thermal_cooling_device and acpi_thermal_should_bind_cdev() may just >>>>> verify that the device is there by itself. >>>> I can of course extend thermal_cooling_device_register() to accept a fwnode_handle that >>>> can be used for both ACPI and OF based cooling device identification, if this is what you >>>> prefer. >>> I'm not sure about this ATM and see below. >>> >>>> This patch series would then turn into a cleanup series, focusing on properly adding >>>> thermal zone devices and cooling devices into the global device hierarchy. >>> I'd prefer to do one thing at a time though. >>> >>> If you want cooling devices to get parents, fine. I'm not >>> fundamentally opposed to that idea, but let's have clear rules for >>> device drivers on how to set those parents for the sake of >>> consistency. >>> >>> As for the ACPI case, one rule that I want to be followed (as already >>> stated multiple times) is that a struct acpi_device can only be a >>> parent of another struct acpi_device. This means that the parent of a >>> cooling device needs to be a platform device or similar representing >>> the actual device that will be used for implementing the cooling. >> OK. >> >>> A separate question is how acpi_thermal_should_bind_cdev() will match >>> cooling devices with the ACPI handles coming from _PSL, _ALx, _TZD >>> etc. and the rule can be that it will look at the ACPI_COMPANION() of >>> the parent of the given cooling device. >> See the example code i pasted above, the whole matching is done using ACPI handles, >> so we can completely leave ACPI_COMPANION() out of this. > ACPI_HANDLE() is a wrapper around ACPI_COMPANION() so your code > effectively does what I said above. True, i forgot about that. >>>>>> we must use a more generic approach. >>>>> I'm not sure what use case you are talking about. >>>>> >>>>> Surely, devices with no representation in the ACPI namespace cannot be >>>>> bound to ACPI thermal zones. For devices that have a representation >>>>> in the ACPI namespace, storing an ACPI handle in devdata should not be >>>>> a problem. >>>> See my above explanations for details, drvdata is defined to hold device private data, >>>> nothing more. >>> This is related to the discussion below. >>> >>>>>> I was thinking about using the acpi_handle of the parent device instead of messing >>>>>> with the drvdata field, but this only works if the parent device pointer of the >>>>>> cooling device is populated. >>>>>> >>>>>> (Cooling devices without a parent device would then be ignored by the ACPI thermal >>>>>> zone driver, as such cooling devices cannot be linked to ACPI). >>>>> It can be arranged this way, but what's the practical difference? >>>>> Anyone who creates a struct thermal_cooling_device and can set its >>>>> parent pointer to a device with an ACPI companion, may as well set its >>>>> devdata to point to that companion directly - or to its ACPI handle if >>>>> that's preferred. >>>> Yes, but this would require explicit support for ACPI in every driver that registers cooling devices. >>> So you want to have generic drivers that may work on ACPI platforms >>> and on DT platforms to be able to create cooling devices for use with >>> ACPI thermal zones. Well, had you started the whole discussion with >>> this statement, it would have been much easier to understand your >>> point. >> Sorry for the messy discussion, i intended to have two separate patch series. This one was meant to >> simply be a preparation, with the important changes inside the ACPI thermal zone driver being implemented >> with the second patch series. >> >> That was also the reason why i send this series as an RFC. >> >>>> Using the parent device to retrieve the acpi_handle or allowing all drivers to just submit a fwnode_handle >>>> of their choice when creating a cooling device will fix this. >>> If you go the parents route, this is an important consideration for >>> the rules on how to set those parents. Namely, they would need to be >>> set so that the fwnode_handle of the parent could be used for binding >>> the cooling device to a thermal zone either on ACPI or on DT systems. >>> >>> Of course, there are also cooling devices whose parents will not have >>> an fwnode_handle and they would still need to work in this brave new >>> world. >>> >> True, i did not think of that. In this case extending thermal_of_cooling_device_register() and friends to accept >> a generic fwnode_handle instead of a OF-specific device_node would make more sense. Most drivers can simply >> pass the result of dev_fwnode() instead of dev->of_node, only those that support multiple cooling device child >> nodes would need additional work to also support ACPI. >> >> Basically, thermal_of_get_cooling_spec() could handle the fwnode_handle in the following manner: >> >> if (cooling_spec.np->fwnode != cdev->fwnode) >> return false; >> >> And the ACPI thermal zone driver could then simply use ACPI_HANDLE_FWNODE() to retrieve the ACPI handle from >> the fwnode_handle (together with a NULL check of course). >> >> If you are OK with this approach, i will forget about the whole parent device stuff for now and focus on extending >> (devm_)thermal_of_cooling_device_register(). There are some additional changes needed for reliably associating >> cooling devices to ACPI trip points using fwnode handles, but those are not that intrusive. >> >> What do you think? > One advantage of using parents is that it will help user space to > figure out connections between the abstract cooling devices and the > associated hardware or firmware entities. I think that this is an > important one. > > It also doesn't prevent fwnode_handle from being used because the > fwnode_handle may just be stored in the parent. I like this more than > associating fwnode_handles directly with abstract cooling devices. > > If the cooling device parent (that is, the provider of the cooling > mechanism used by it) does not have an fwnode_handle, then either it > needs to be driven directly from user space, or the driver creating a > thermal zone device needs to provide a specific .should_bind() > callback that will know what to look for. > OK. When sending the next revision of this patch series, should i also keep the patches for the thermal zone device or should i only keep the patches concerning the cooling devices? Thanks, Armin Wolf From rafael at kernel.org Sun Nov 30 04:55:24 2025 From: rafael at kernel.org (Rafael J. Wysocki) Date: Sun, 30 Nov 2025 13:55:24 +0100 Subject: [PATCH RFC RESEND 0/8] thermal: core: Allow setting the parent device of thermal zone/cooling devices In-Reply-To: References: <20251120-thermal-device-v1-0-bbdad594d57a@gmx.de> <5f3ef610-4024-4ca0-a934-2649f5d25f40@gmx.de> Message-ID: On Sat, Nov 29, 2025 at 12:36?PM Armin Wolf wrote: > > Am 28.11.25 um 12:40 schrieb Rafael J. Wysocki: > > > On Fri, Nov 28, 2025 at 12:50?AM Armin Wolf wrote: > >> Am 27.11.25 um 22:46 schrieb Rafael J. Wysocki: [cut] > >> What do you think? > > One advantage of using parents is that it will help user space to > > figure out connections between the abstract cooling devices and the > > associated hardware or firmware entities. I think that this is an > > important one. > > > > It also doesn't prevent fwnode_handle from being used because the > > fwnode_handle may just be stored in the parent. I like this more than > > associating fwnode_handles directly with abstract cooling devices. > > > > If the cooling device parent (that is, the provider of the cooling > > mechanism used by it) does not have an fwnode_handle, then either it > > needs to be driven directly from user space, or the driver creating a > > thermal zone device needs to provide a specific .should_bind() > > callback that will know what to look for. > > > OK. When sending the next revision of this patch series, should i also keep > the patches for the thermal zone device or should i only keep the patches > concerning the cooling devices? The cooling device changes are kind of unrelated to the thermal zone device changes, so it would be better to send them as separate series, but you may as well send those series at the same time as far as I'm concerned.