[PATCH] nvme: hwmon: Add support for throttling temperature feature

Armin Wolf W_Armin at gmx.de
Sat Aug 6 13:19:20 PDT 2022


Am 06.08.22 um 13:58 schrieb Tokunori Ikegami:

> Note: Sorry let me resend the mail below as text format since it was
> not delivered to the mailing lists as contained HTML subpart.
>
> Hi,
>
> Thanks for your comments.
>
> On 2022/08/06 17:31, Guenter Roeck wrote:
>> On Sat, Aug 06, 2022 at 02:46:06PM +0900, Tokunori Ikegami wrote:
>>> NVMe drives support host controlled thermal management feature as
>>> optional.
>>> The thermal management temperature are different from the
>>> temperature threshold.
>>> So add functionality to set the throttling temperature values.
>>>
>>> Signed-off-by: Tokunori Ikegami <ikegami.t at gmail.com>
>
> I think actually the suggested attributes are not met with the
> throttling temperatures as below.
>
>   temp[1-*]_emergency: Temperature emergency max value, for chips
> supporting more than two upper temperature limits.
>   temp[1-*]_lcrit: Temperature critical min value, typically lower
> than corresponding temp_min values.
>
>   Thermal Management Temperature 1 (TMT1): This field specifies the
> temperature, in Kelvins, when the controller begins to transition to
> lower power active power states or performs vendor specific thermal
> management actions while minimizing the impact on performance (e.g.,
> light throttling) in order to attempt to reduce the Composite
> Temperature.
>   Thermal Management Temperature 2 (TMT2): This field specifies the
> temperature, in Kelvins, when the controller begins to transition to
> lower power active power states or perform vendor specific thermal
> management actions regardless of the impact on performance (e.g.,
> heavy throttling) in order to attempt to reduce the Composite
> Temperature.
>
Maybe those two throttle thresholds could be represented by tempX_crit and tempX_emergency,
the special throttle effect could be documented in the drivers documentation.

Since tempX_crit is already used to report CCTEMP, maybe this value could be reported with tempX_rated_max instead?
As far as i know, CCTEMP is the maximum composite temperature rating of the NVME device, so reporting is as tempX_rated_max would make sense.

Armin Wolf

>> NACK. There are several existing limit attributes which can be used
>> for this purpose. I would suggest to use EMERGENCY and LCRIT attributes.
>>
>> Furthermore, one can not just extend the hwmon ABI without discussion,
>> much less as part of a patch introducing its use. Any attribute
>> introduced
>> into the ABI must benefit more than one device, and a matching
>> implementation in the sensors command and the lm-sensors library is
>> expected.
>
> Sorry I am not sure about the hwmon ABI situation but if possible
> could you please consider or discuss to extend the attributes from
> this patch review since the suggested attributes seem difficult to use
> instead? (Is it difficult?)
> By the way I have already created the lm-sensors pull request below.
>   <https://github.com/lm-sensors/lm-sensors/pull/406>
>
> Regards,
> Ikegami
>
>>
>> Guenter



More information about the Linux-nvme mailing list