[PATCH] drivers/perf: riscv_pmu_sbi: add perf_user_access sysctl

Atish Patra atishp at atishpatra.org
Thu Sep 29 01:11:08 PDT 2022


On Fri, Aug 26, 2022 at 3:10 PM Atish Patra <atishp at atishpatra.org> wrote:
>
>
>
> On Fri, Aug 26, 2022 at 8:16 AM Heiko Stuebner <heiko at sntech.de> wrote:
>>
>> Add a sysctl similar to the one on arm64 to enable/disable
>> access to counter CSRs from u-mode on RISC-V.
>>
>> The default is of course set to disabled keeping the current
>> state of access - to only the TIME CSR.
>>
>> Signed-off-by: Heiko Stuebner <heiko at sntech.de>
>> ---
>>  Documentation/admin-guide/sysctl/kernel.rst |  6 +--
>>  drivers/perf/riscv_pmu_sbi.c                | 43 ++++++++++++++++++++-
>>  2 files changed, 44 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>> index ee6572b1edad..efd4bc385e7a 100644
>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>> @@ -894,15 +894,15 @@ enabled, otherwise writing to this file will return ``-EBUSY``.
>>  The default value is 8.
>>
>>
>> -perf_user_access (arm64 only)
>> -=================================
>> +perf_user_access (arm64 and riscv only)
>> +=======================================
>>
>>  Controls user space access for reading perf event counters. When set to 1,
>>  user space can read performance monitor counter registers directly.
>>
>>  The default value is 0 (access disabled).
>>
>> -See Documentation/arm64/perf.rst for more information.
>> +See Documentation/arm64/perf.rst for more information on arm64
>>
>>
>>  pid_max
>> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
>> index 6f6681bbfd36..7aab8d673357 100644
>> --- a/drivers/perf/riscv_pmu_sbi.c
>> +++ b/drivers/perf/riscv_pmu_sbi.c
>> @@ -41,6 +41,8 @@ static const struct attribute_group *riscv_pmu_attr_groups[] = {
>>         NULL,
>>  };
>>
>> +static int sysctl_perf_user_access __read_mostly;
>> +
>>  /*
>>   * RISC-V doesn't have hetergenous harts yet. This need to be part of
>>   * per_cpu in case of harts with different pmu counters
>> @@ -640,13 +642,22 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
>>         return IRQ_HANDLED;
>>  }
>>
>> +/*
>> + * Depending on the perf_user_access setting, enable the access
>> + * from usermode either for all counters or for TIME csr only.
>> + */
>> +static void riscv_pmu_update_user_access(void *info)
>> +{
>> +       csr_write(CSR_SCOUNTEREN, sysctl_perf_user_access ? GENMASK(31, 0) :
>> +                                                           0x2);
>> +}
>> +
>>  static int pmu_sbi_starting_cpu(unsigned int cpu, struct hlist_node *node)
>>  {
>>         struct riscv_pmu *pmu = hlist_entry_safe(node, struct riscv_pmu, node);
>>         struct cpu_hw_events *cpu_hw_evt = this_cpu_ptr(pmu->hw_events);
>>
>> -       /* Enable the access for TIME csr only from the user mode now */
>> -       csr_write(CSR_SCOUNTEREN, 0x2);
>> +       riscv_pmu_update_user_access(NULL);
>>
>>         /* Stop all the counters so that they can be enabled from perf */
>>         pmu_sbi_stop_all(pmu);
>> @@ -785,6 +796,32 @@ static void riscv_pmu_destroy(struct riscv_pmu *pmu)
>>         cpuhp_state_remove_instance(CPUHP_AP_PERF_RISCV_STARTING, &pmu->node);
>>  }
>>
>> +static int riscv_pmu_proc_user_access_handler(struct ctl_table *table,
>> +                       int write, void *buffer, size_t *lenp, loff_t *ppos)
>> +{
>> +       int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>> +
>> +       if (ret || !write)
>> +               return ret;
>> +
>> +       on_each_cpu(riscv_pmu_update_user_access, NULL, 1);
>> +
>
>
> This enables all the counter access from the user space. Not the ones that were being monitored through perf system calls.
> Looking at ARM64 code, they clear out all other counters during the start to avoid the leakage.
>
> This is a bit expensive to achieve with the current SBI interface as there is no simpler interface to clear out the mhpmcounter value without starting it.
> We have to start the other counters with 0 value and stop it immediately which is very ugly.
>
> There is a supervisor counter delegation proposal that is in discussion which can write the mhpmevent/mhpmcounter (via a different CSR) directly from S-mode.
> Once that is in place, we can enable such functionality.
>
> I am not sure how commonly this feature is used. Personally, I would prefer to not enable this until supervisor counter delegation is implemented.
> But I would like to hear what others think.
>
> If it is a very common feature that RISC-V should support immediately, we have to resort to the ugly route of enabling/disabling the unused counters.
>

I just realized my above response was never delivered to the mailing
list. In a way that's a good thing as I am trying to retract my above
opinion :)
There are some discussions around enabling cycle/instret for the user
space unconditionally [1] and a patch from palmer[2]
To allow userspace to selectively enable the cycle counter, we need
this patch. Thus, it should be merged asap.

I am still not sure how big the problem is for exposing the previously
used counter values. We can have separate discussions on that and a
follow up patch if required.

[1] https://lore.kernel.org/all/YxEhC%2FmDW1lFt36J@aurel32.net/
[2] https://lore.kernel.org/linux-riscv/YzS0RNzH2CprLSyc@spud/T/#ma8e0034fa9172a66820d6d5fbc3314619ff657a5


>> +       return 0;
>> +}
>> +
>> +static struct ctl_table sbi_pmu_sysctl_table[] = {
>> +       {
>> +               .procname       = "perf_user_access",
>> +               .data           = &sysctl_perf_user_access,
>> +               .maxlen         = sizeof(unsigned int),
>> +               .mode           = 0644,
>> +               .proc_handler   = riscv_pmu_proc_user_access_handler,
>> +               .extra1         = SYSCTL_ZERO,
>> +               .extra2         = SYSCTL_ONE,
>> +       },
>> +       { }
>> +};
>> +
>>  static int pmu_sbi_device_probe(struct platform_device *pdev)
>>  {
>>         struct riscv_pmu *pmu = NULL;
>> @@ -834,6 +871,8 @@ static int pmu_sbi_device_probe(struct platform_device *pdev)
>>         if (ret)
>>                 goto out_unregister;
>>
>> +       register_sysctl("kernel", sbi_pmu_sysctl_table);
>> +
>>         return 0;
>>
>>  out_unregister:
>> --
>> 2.35.1
>>
>
>
> --
> Regards,
> Atish



--
Regards,
Atish



More information about the linux-riscv mailing list