[PATCH v1] PM / devfreq: Add HiSilicon uncore frequency scaling driver

Jie Zhan zhanjie9 at hisilicon.com
Mon May 19 19:22:26 PDT 2025



On 20/05/2025 09:36, lihuisong (C) wrote:
> 
> 在 2025/5/19 16:03, Jie Zhan 写道:
>> Hi Huisong,
>>
>> Thanks for reviewing.
>>
>> On 12/05/2025 10:17, lihuisong (C) wrote:
>>> Hi Jie,
>>>
>>> There are some trivia advice. please have a look.
>>>
>>> /Huisong
>>> 在 2025/5/6 10:14, Jie Zhan 写道:
>>>> Add the HiSilicon uncore frequency scaling driver for Kunpeng SoCs based on
>>>> the devfreq framework.  The uncore domain contains shared computing
>>>> resources, including system interconnects and L3 cache.  The uncore
>>>> frequency significantly impacts the system-wide performance as well as
>>>> power consumption.  This driver adds support for runtime management of
>>>> uncore frequency from kernel and userspace.  The main function includes
>>>> setting and getting frequencies, changing frequency scaling policies, and
>>>> querying the list of CPUs whose performance is significantly related to
>>>> this uncore frequency domain, etc.  The driver communicates with a platform
>>>> controller through an ACPI PCC mailbox to take the actual actions of
>>>> frequency scaling.
>>>>
>>>> Co-developed-by: Lifeng Zheng <zhenglifeng1 at huawei.com>
>>>> Signed-off-by: Lifeng Zheng <zhenglifeng1 at huawei.com>
>>>> Signed-off-by: Jie Zhan <zhanjie9 at hisilicon.com>
>>>> ---
>>>>    drivers/devfreq/Kconfig            |  11 +
>>>>    drivers/devfreq/Makefile           |   1 +
>>>>    drivers/devfreq/hisi_uncore_freq.c | 722 +++++++++++++++++++++++++++++
>>>>    3 files changed, 734 insertions(+)
>>>>    create mode 100644 drivers/devfreq/hisi_uncore_freq.c
>>>>
>> ...
>>
>>>> +
>>>> +static int hisi_uncore_request_pcc_chan(struct hisi_uncore_freq *uncore)
>>>> +{
>>>> +    struct pcc_mbox_chan *pcc_chan;
>>>> +    int rc;
>>>> +
>>>> +    uncore->cl = (struct mbox_client) {
>>>> +        .dev = uncore->dev,
>>>> +        .tx_block = false,
>>>> +        .knows_txdone = true,
>>>> +    };
>>>> +
>>>> +    pcc_chan = pcc_mbox_request_channel(&uncore->cl, uncore->chan_id);
>>>> +    if (IS_ERR(pcc_chan)) {
>>>> +        dev_err(uncore->dev, "Failed to request PCC channel %u\n",
>>>> +            uncore->chan_id);
>>>> +        return -ENODEV;
>>>> +    }
>>>> +
>>>> +    if (!pcc_chan->shmem_base_addr) {
>>>> +        dev_err(uncore->dev, "Invalid PCC shared memory address\n");
>>>> +        rc = -EINVAL;
>>>> +        goto err_pcc_chan_free;
>>>> +    }
>>>> +
>>>> +    if (pcc_chan->shmem_size < sizeof(struct hisi_uncore_pcc_shmem)) {
>>>> +        dev_err(uncore->dev, "Invalid PCC shared memory size (%lluB)\n",
>>>> +            pcc_chan->shmem_size);
>>>> +        rc = -EINVAL;
>>>> +        goto err_pcc_chan_free;
>>>> +    }
>>>> +
>>>> +    uncore->pcc_shmem_addr = ioremap(pcc_chan->shmem_base_addr,
>>>> +                     pcc_chan->shmem_size);
>>> Now driver doesn't need to ioremap repeatly because of Sudeep's work.
>>> Please see the patch:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fa362ffafa51b08cf8e2fcca38e056332f6b9b05
>> Cool. Now we can use pcc_chan->shmem. Will update the code based on that.
>>
>>>> +    if (!uncore->pcc_shmem_addr) {
>>>> +        rc = -ENOMEM;
>>>> +        goto err_pcc_chan_free;
>>>> +    }
>>>> +
>> ...
>>
>>>> +static int hisi_uncore_cmd_send(struct hisi_uncore_freq *uncore,
>>>> +                u8 cmd, u32 *data)
>>>> +{
>>>> +    struct hisi_uncore_pcc_shmem __iomem *addr;
>>>> +    struct hisi_uncore_pcc_shmem shmem;
>>>> +    struct pcc_mbox_chan *pchan;
>>>> +    unsigned int mrtt;
>>>> +    s64 time_delta;
>>>> +    u16 status;
>>>> +    int rc;
>>>> +
>>>> +    guard(mutex)(&uncore->pcc_lock);
>>>> +
>>>> +    pchan = uncore->pchan;
>>>> +    addr = uncore->pcc_shmem_addr;
>>>> +
>>>> +    if (!pchan || !addr)
>>>> +        return -ENODEV;
>>> This parameter validation seems redundant. From driver point, it always be ok.
>> uncore->pchan could be NULL if hisi_uncore_cmd_send() if called after
>> hisi_uncore_free_pcc_chan().
>>
>> In the driver detaching processor, the drv->remove() is called before devm
>> resources free, i.e. where devfreq device is freed.
>> Hence the devfreq governor, e.g. simpleondemand, could possibly issue a
>> freq request after the driver is removed.
>>
>> The checking and setting uncore->pchan = NULL in this driver is trying to
>> protect against the above case.  It actually happened during our tests so
>> it's not redundant unless we get a better way to solve that.
>>
>>>> +
>>>> +    /* Handle the Minimum Request Turnaround Time (MRTT) */
>>>> +    mrtt = pchan->min_turnaround_time;
>>>> +    time_delta = ktime_us_delta(ktime_get(),
>>>> +                    uncore->last_cmd_cmpl_time);
>>>> +    if (mrtt > time_delta)
>>>> +        udelay(mrtt - time_delta);
>>>> +
>>>> +    /* Copy data */
>>>> +    shmem.head = (struct acpi_pcct_shared_memory) {
>>>> +        .signature = PCC_SIGNATURE | uncore->chan_id,
>>>> +        .command = cmd,
>>>> +        .status = 0,
>>>> +    };
>>>> +    shmem.pcc_data.data = *data;
>>>> +    memcpy_toio(addr, &shmem, sizeof(shmem));
>>>> +
>>>> +    /* Ring doorbell */
>>>> +    rc = mbox_send_message(pchan->mchan, &cmd);
>>>> +    if (rc < 0) {
>>>> +        dev_err(uncore->dev, "Failed to send mbox message, %d\n", rc);
>>>> +        return rc;
>>>> +    }
>>>> +
>>>> +    /* Wait status */
>>>> +    rc = readw_poll_timeout(&addr->head.status, status,
>>>> +                status & (PCC_STATUS_CMD_COMPLETE |
>>>> +                      PCC_STATUS_ERROR),
>>>> +                HUCF_PCC_POLL_INTERVAL_US,
>>>> +                pchan->latency * HUCF_PCC_POLL_TIMEOUT_NUM);
>>>> +    if (rc) {
>>>> +        dev_err(uncore->dev, "PCC channel response timeout, cmd=%u\n", cmd);
>>>> +        goto exit;
>>>> +    }
>>>> +
>>>> +    if (status & PCC_STATUS_ERROR) {
>>>> +        dev_err(uncore->dev, "PCC cmd error, cmd=%u\n", cmd);
>>>> +        rc = -EIO;
>>>> +        goto exit;
>>>> +    }
>>>> +
>>>> +exit:
>>>> +    uncore->last_cmd_cmpl_time = ktime_get();
>>>> +
>>>> +    /* Copy data back */
>>>> +    memcpy_fromio(data, &addr->pcc_data.data, sizeof(*data));
>>>> +
>>>> +    /* Clear mailbox active req */
>>>> +    mbox_client_txdone(pchan->mchan, rc);
>>>> +
>>>> +    return rc;
>>>> +}
>>>> +
>>>> +static int hisi_uncore_target(struct device *dev, unsigned long *freq,
>>>> +                  u32 flags)
>>>> +{
>>>> +    struct hisi_uncore_freq *uncore = dev_get_drvdata(dev);
>>>> +    struct dev_pm_opp *opp;
>>>> +    u32 data;
>>>> +
>>>> +    /*
>>>> +     * ->target() might be called after drv->remove() and before governor
>>>> +     * stopped, so it's necessary to check here, but no need to warn.
>>>> +     */
>>>> +    if (!uncore || !uncore->pchan)
>>>> +        return 0;
>>> If have the following tips, this check can be removed, right?
>> See below.
>>
>> ...
> We need to resolve the issue completely.🙂
>>>> +
>>>> +static int hisi_uncore_devfreq_register(struct hisi_uncore_freq *uncore)
>>>> +{
>>>> +    struct devfreq_dev_profile *profile;
>>>> +    unsigned long freq;
>>>> +    u32 data;
>>>> +    int rc;
>>>> +
>>>> +    rc = hisi_uncore_get_cur_freq(uncore->dev, &freq);
>>>> +    if (rc) {
>>>> +        dev_err(uncore->dev, "Failed to get plat init freq (%d)\n", rc);
>>>> +        return rc;
>>>> +    }
>>>> +
>>>> +    profile = devm_kzalloc(uncore->dev, sizeof(*profile), GFP_KERNEL);
>>>> +    if (!profile)
>>>> +        return -ENOMEM;
>>>> +
>>>> +    profile->initial_freq = freq;
>>>> +    profile->polling_ms = DEFAULT_POLLING_MS;
>>>> +    profile->timer = DEVFREQ_TIMER_DELAYED;
>>>> +    profile->target = hisi_uncore_target;
>>>> +    profile->get_dev_status = hisi_uncore_get_dev_status;
>>>> +    profile->get_cur_freq = hisi_uncore_get_cur_freq;
>>>> +
>>>> +    rc = hisi_uncore_cmd_send(uncore, HUCF_PCC_CMD_GET_MODE, &data);
>>>> +    if (rc) {
>>>> +        dev_err(uncore->dev, "Failed to get operate mode (%d)\n", rc);
>>>> +        return rc;
>>>> +    }
>>>> +
>>>> +    if (data == HUCF_MODE_PLATFORM)
>>>> +        uncore->devfreq = devm_devfreq_add_device(uncore->dev, profile,
>>>> +                              hisi_platform_governor.name, NULL);
>>>> +    else
>>>> +        uncore->devfreq = devm_devfreq_add_device(uncore->dev, profile,
>>>> +                              DEVFREQ_GOV_PERFORMANCE, NULL);
>>> This code creates sysfs file which will be freed after .remove() executed.
>>> However, the PCC channel is released in .remove().
>>> This may lead to UAF issue.
>> Not only those sysfs files, the whole devfreq device & governor is still
>> there after drv->remove().  That's the main cause of issues when
>> registering a devfreq device by devm_devfreq_add_device().
>>
>> However, at the moment, we have prevented against any UAF issues?
> I understand you. But please see two options below again.
>>
>>> We have tow options:
>>> *option 1:*
>>> hisi_uncore_free_pcc_chan() also move to the last release instead of the .remove() by devm_add_action_or_reset() or devm_add_action().
>> I don't quite understand what this means.
>> hisi_uncore_free_pcc_chan() is currently the last step of drv->remove().
>> If we let it be managed by devm, it can't be guaranteed that the pcc chan
>> is freed after devfreq device is removed.
> This option is just to let hisi_uncore_free_pcc_chan() to call after removing all sysfs files created in devfreq.
>>
>>> *option 2:*
>>> Use devfreq_add_device() to replace devm_devfreq_add_device() and .remove() in driver call devfreq_remove_device() to delete these sysfs files.
>>>
>>> There are the resources  dependency between these interfaces. And there is a potential release sequence in .remove, devm_xxx, and even the devm_xxx release inside.
>>> I prefer to explicitly free interfaces that have resource dependencies in driver.
>>>
>>>
>> This looks like a possible plan, but we will need to go back to the raw
>> ABIs.
> 
> Yeah, this option is simple.
> 
> Whether driver selects the raw ABI or devm_ prefixed ABI, which should be open, I think. This depends on the user's preference.
> 

Sure, understood now.
Option 1: Add the devm resources in a correct order and let devm manage the
release sequence.
Option 2: Explicitly define the release sequence in the driver remove.

Since currently only one devfreq driver uses the non-devm ABI, I may prefer
to try the Option 1 first.



More information about the linux-arm-kernel mailing list