[PATCH v1] PM / devfreq: Add HiSilicon uncore frequency scaling driver

lihuisong (C) lihuisong at huawei.com
Mon May 19 18:36:18 PDT 2025


在 2025/5/19 16:03, Jie Zhan 写道:
> Hi Huisong,
>
> Thanks for reviewing.
>
> On 12/05/2025 10:17, lihuisong (C) wrote:
>> Hi Jie,
>>
>> There are some trivia advice. please have a look.
>>
>> /Huisong
>> 在 2025/5/6 10:14, Jie Zhan 写道:
>>> Add the HiSilicon uncore frequency scaling driver for Kunpeng SoCs based on
>>> the devfreq framework.  The uncore domain contains shared computing
>>> resources, including system interconnects and L3 cache.  The uncore
>>> frequency significantly impacts the system-wide performance as well as
>>> power consumption.  This driver adds support for runtime management of
>>> uncore frequency from kernel and userspace.  The main function includes
>>> setting and getting frequencies, changing frequency scaling policies, and
>>> querying the list of CPUs whose performance is significantly related to
>>> this uncore frequency domain, etc.  The driver communicates with a platform
>>> controller through an ACPI PCC mailbox to take the actual actions of
>>> frequency scaling.
>>>
>>> Co-developed-by: Lifeng Zheng <zhenglifeng1 at huawei.com>
>>> Signed-off-by: Lifeng Zheng <zhenglifeng1 at huawei.com>
>>> Signed-off-by: Jie Zhan <zhanjie9 at hisilicon.com>
>>> ---
>>>    drivers/devfreq/Kconfig            |  11 +
>>>    drivers/devfreq/Makefile           |   1 +
>>>    drivers/devfreq/hisi_uncore_freq.c | 722 +++++++++++++++++++++++++++++
>>>    3 files changed, 734 insertions(+)
>>>    create mode 100644 drivers/devfreq/hisi_uncore_freq.c
>>>
> ...
>
>>> +
>>> +static int hisi_uncore_request_pcc_chan(struct hisi_uncore_freq *uncore)
>>> +{
>>> +    struct pcc_mbox_chan *pcc_chan;
>>> +    int rc;
>>> +
>>> +    uncore->cl = (struct mbox_client) {
>>> +        .dev = uncore->dev,
>>> +        .tx_block = false,
>>> +        .knows_txdone = true,
>>> +    };
>>> +
>>> +    pcc_chan = pcc_mbox_request_channel(&uncore->cl, uncore->chan_id);
>>> +    if (IS_ERR(pcc_chan)) {
>>> +        dev_err(uncore->dev, "Failed to request PCC channel %u\n",
>>> +            uncore->chan_id);
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    if (!pcc_chan->shmem_base_addr) {
>>> +        dev_err(uncore->dev, "Invalid PCC shared memory address\n");
>>> +        rc = -EINVAL;
>>> +        goto err_pcc_chan_free;
>>> +    }
>>> +
>>> +    if (pcc_chan->shmem_size < sizeof(struct hisi_uncore_pcc_shmem)) {
>>> +        dev_err(uncore->dev, "Invalid PCC shared memory size (%lluB)\n",
>>> +            pcc_chan->shmem_size);
>>> +        rc = -EINVAL;
>>> +        goto err_pcc_chan_free;
>>> +    }
>>> +
>>> +    uncore->pcc_shmem_addr = ioremap(pcc_chan->shmem_base_addr,
>>> +                     pcc_chan->shmem_size);
>> Now driver doesn't need to ioremap repeatly because of Sudeep's work.
>> Please see the patch:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fa362ffafa51b08cf8e2fcca38e056332f6b9b05
> Cool. Now we can use pcc_chan->shmem. Will update the code based on that.
>
>>> +    if (!uncore->pcc_shmem_addr) {
>>> +        rc = -ENOMEM;
>>> +        goto err_pcc_chan_free;
>>> +    }
>>> +
> ...
>
>>> +static int hisi_uncore_cmd_send(struct hisi_uncore_freq *uncore,
>>> +                u8 cmd, u32 *data)
>>> +{
>>> +    struct hisi_uncore_pcc_shmem __iomem *addr;
>>> +    struct hisi_uncore_pcc_shmem shmem;
>>> +    struct pcc_mbox_chan *pchan;
>>> +    unsigned int mrtt;
>>> +    s64 time_delta;
>>> +    u16 status;
>>> +    int rc;
>>> +
>>> +    guard(mutex)(&uncore->pcc_lock);
>>> +
>>> +    pchan = uncore->pchan;
>>> +    addr = uncore->pcc_shmem_addr;
>>> +
>>> +    if (!pchan || !addr)
>>> +        return -ENODEV;
>> This parameter validation seems redundant. From driver point, it always be ok.
> uncore->pchan could be NULL if hisi_uncore_cmd_send() if called after
> hisi_uncore_free_pcc_chan().
>
> In the driver detaching processor, the drv->remove() is called before devm
> resources free, i.e. where devfreq device is freed.
> Hence the devfreq governor, e.g. simpleondemand, could possibly issue a
> freq request after the driver is removed.
>
> The checking and setting uncore->pchan = NULL in this driver is trying to
> protect against the above case.  It actually happened during our tests so
> it's not redundant unless we get a better way to solve that.
>
>>> +
>>> +    /* Handle the Minimum Request Turnaround Time (MRTT) */
>>> +    mrtt = pchan->min_turnaround_time;
>>> +    time_delta = ktime_us_delta(ktime_get(),
>>> +                    uncore->last_cmd_cmpl_time);
>>> +    if (mrtt > time_delta)
>>> +        udelay(mrtt - time_delta);
>>> +
>>> +    /* Copy data */
>>> +    shmem.head = (struct acpi_pcct_shared_memory) {
>>> +        .signature = PCC_SIGNATURE | uncore->chan_id,
>>> +        .command = cmd,
>>> +        .status = 0,
>>> +    };
>>> +    shmem.pcc_data.data = *data;
>>> +    memcpy_toio(addr, &shmem, sizeof(shmem));
>>> +
>>> +    /* Ring doorbell */
>>> +    rc = mbox_send_message(pchan->mchan, &cmd);
>>> +    if (rc < 0) {
>>> +        dev_err(uncore->dev, "Failed to send mbox message, %d\n", rc);
>>> +        return rc;
>>> +    }
>>> +
>>> +    /* Wait status */
>>> +    rc = readw_poll_timeout(&addr->head.status, status,
>>> +                status & (PCC_STATUS_CMD_COMPLETE |
>>> +                      PCC_STATUS_ERROR),
>>> +                HUCF_PCC_POLL_INTERVAL_US,
>>> +                pchan->latency * HUCF_PCC_POLL_TIMEOUT_NUM);
>>> +    if (rc) {
>>> +        dev_err(uncore->dev, "PCC channel response timeout, cmd=%u\n", cmd);
>>> +        goto exit;
>>> +    }
>>> +
>>> +    if (status & PCC_STATUS_ERROR) {
>>> +        dev_err(uncore->dev, "PCC cmd error, cmd=%u\n", cmd);
>>> +        rc = -EIO;
>>> +        goto exit;
>>> +    }
>>> +
>>> +exit:
>>> +    uncore->last_cmd_cmpl_time = ktime_get();
>>> +
>>> +    /* Copy data back */
>>> +    memcpy_fromio(data, &addr->pcc_data.data, sizeof(*data));
>>> +
>>> +    /* Clear mailbox active req */
>>> +    mbox_client_txdone(pchan->mchan, rc);
>>> +
>>> +    return rc;
>>> +}
>>> +
>>> +static int hisi_uncore_target(struct device *dev, unsigned long *freq,
>>> +                  u32 flags)
>>> +{
>>> +    struct hisi_uncore_freq *uncore = dev_get_drvdata(dev);
>>> +    struct dev_pm_opp *opp;
>>> +    u32 data;
>>> +
>>> +    /*
>>> +     * ->target() might be called after drv->remove() and before governor
>>> +     * stopped, so it's necessary to check here, but no need to warn.
>>> +     */
>>> +    if (!uncore || !uncore->pchan)
>>> +        return 0;
>> If have the following tips, this check can be removed, right?
> See below.
>
> ...
We need to resolve the issue completely.🙂
>>> +
>>> +static int hisi_uncore_devfreq_register(struct hisi_uncore_freq *uncore)
>>> +{
>>> +    struct devfreq_dev_profile *profile;
>>> +    unsigned long freq;
>>> +    u32 data;
>>> +    int rc;
>>> +
>>> +    rc = hisi_uncore_get_cur_freq(uncore->dev, &freq);
>>> +    if (rc) {
>>> +        dev_err(uncore->dev, "Failed to get plat init freq (%d)\n", rc);
>>> +        return rc;
>>> +    }
>>> +
>>> +    profile = devm_kzalloc(uncore->dev, sizeof(*profile), GFP_KERNEL);
>>> +    if (!profile)
>>> +        return -ENOMEM;
>>> +
>>> +    profile->initial_freq = freq;
>>> +    profile->polling_ms = DEFAULT_POLLING_MS;
>>> +    profile->timer = DEVFREQ_TIMER_DELAYED;
>>> +    profile->target = hisi_uncore_target;
>>> +    profile->get_dev_status = hisi_uncore_get_dev_status;
>>> +    profile->get_cur_freq = hisi_uncore_get_cur_freq;
>>> +
>>> +    rc = hisi_uncore_cmd_send(uncore, HUCF_PCC_CMD_GET_MODE, &data);
>>> +    if (rc) {
>>> +        dev_err(uncore->dev, "Failed to get operate mode (%d)\n", rc);
>>> +        return rc;
>>> +    }
>>> +
>>> +    if (data == HUCF_MODE_PLATFORM)
>>> +        uncore->devfreq = devm_devfreq_add_device(uncore->dev, profile,
>>> +                              hisi_platform_governor.name, NULL);
>>> +    else
>>> +        uncore->devfreq = devm_devfreq_add_device(uncore->dev, profile,
>>> +                              DEVFREQ_GOV_PERFORMANCE, NULL);
>> This code creates sysfs file which will be freed after .remove() executed.
>> However, the PCC channel is released in .remove().
>> This may lead to UAF issue.
> Not only those sysfs files, the whole devfreq device & governor is still
> there after drv->remove().  That's the main cause of issues when
> registering a devfreq device by devm_devfreq_add_device().
>
> However, at the moment, we have prevented against any UAF issues?
I understand you. But please see two options below again.
>
>> We have tow options:
>> *option 1:*
>> hisi_uncore_free_pcc_chan() also move to the last release instead of the .remove() by devm_add_action_or_reset() or devm_add_action().
> I don't quite understand what this means.
> hisi_uncore_free_pcc_chan() is currently the last step of drv->remove().
> If we let it be managed by devm, it can't be guaranteed that the pcc chan
> is freed after devfreq device is removed.
This option is just to let hisi_uncore_free_pcc_chan() to call after 
removing all sysfs files created in devfreq.
>
>> *option 2:*
>> Use devfreq_add_device() to replace devm_devfreq_add_device() and .remove() in driver call devfreq_remove_device() to delete these sysfs files.
>>
>> There are the resources  dependency between these interfaces. And there is a potential release sequence in .remove, devm_xxx, and even the devm_xxx release inside.
>> I prefer to explicitly free interfaces that have resource dependencies in driver.
>>
>>
> This looks like a possible plan, but we will need to go back to the raw
> ABIs.

Yeah, this option is simple.

Whether driver selects the raw ABI or devm_ prefixed ABI, which should 
be open, I think. This depends on the user's preference.

>
> ...



More information about the linux-arm-kernel mailing list