[PATCH v5] perf: arm_cspmu: Separate Arm and vendor module

Sat Aug 19 13:11:11 PDT 2023

Hi Suzuki,

> -----Original Message-----
> From: Suzuki K Poulose <suzuki.poulose at arm.com>
> Sent: Friday, August 11, 2023 6:29 AM
> To: Besar Wicaksono <bwicaksono at nvidia.com>; robin.murphy at arm.com;
> ilkka at os.amperecomputing.com; catalin.marinas at arm.com; will at kernel.org;
> mark.rutland at arm.com
> Cc: linux-arm-kernel at lists.infradead.org; linux-kernel at vger.kernel.org; linux-
> tegra at vger.kernel.org; Thierry Reding <treding at nvidia.com>; Jonathan
> Hunter <jonathanh at nvidia.com>; Vikram Sethi <vsethi at nvidia.com>; Richard
> Wiley <rwiley at nvidia.com>; Eric Funsten <efunsten at nvidia.com>
> Subject: Re: [PATCH v5] perf: arm_cspmu: Separate Arm and vendor module
> 
> External email: Use caution opening links or attachments
> 
> 
> On 05/07/2023 11:47, Besar Wicaksono wrote:
> > Arm Coresight PMU driver consists of main standard code and
> > vendor backend code. Both are currently built as a single module.
> > This patch adds vendor registration API to separate the two to
> > keep things modular. The main driver requests each known backend
> > module during initialization and defer device binding process.
> > The backend module then registers an init callback to the main
> > driver and continue the device driver binding process.
> >
> > Signed-off-by: Besar Wicaksono <bwicaksono at nvidia.com>
> 
> Apologies for the delay.
> 
> > ---
> >
> > Changes from v4:
> >   * Fix warning reported by kernel test robot
> > v4: https://lore.kernel.org/linux-arm-kernel/20230620041438.32514-1-
> bwicaksono at nvidia.com/T/#u
> >
> > Changes from v3:
> >   * Move impl registration module back to main driver module.
> >   * Rebase from will (for-next/perf)
> > Thanks to Robin for the feedback.
> > v3: https://lore.kernel.org/linux-arm-kernel/20230505005956.22837-1-
> bwicaksono at nvidia.com/T/#u
> >
> > Changes from v2:
> >   * Move sysfs_event/format_show definition to arm_cspmu.h and move
> impl
> >     registration API definition to a separate module so main driver and vendor
> >     module are independent.
> >   * The registration API now just sets the impl_init_ops callback, no reprobe.
> >   * Add PMIIDR table that maps to the vendor module name. During device
> probe,
> >     main driver requests the vendor module if PMIIDR is matching.
> >   * Keeping the name of the main driver module as arm_cspmu_module.
> > Thanks to Robin and Suzuki for the feedback.
> > v2:  https://lore.kernel.org/linux-arm-kernel/20230418062030.45620-1-
> bwicaksono at nvidia.com/T/#u
> >
> > Changes from v1:
> >   * Added separate Kconfig entry for nvidia backend
> >   * Added lock to protect accesses to the lists
> >   * Added support for matching subset devices from a vendor
> >   * Added state tracking to avoid reprobe when a device is in use
> > Thanks to Suzuki for the feedback.
> > v1: https://lore.kernel.org/linux-arm-kernel/20230403163905.20354-1-
> bwicaksono at nvidia.com/T/#u
> >
> > ---
> >   drivers/perf/arm_cspmu/Kconfig        |   9 +-
> >   drivers/perf/arm_cspmu/Makefile       |   6 +-
> >   drivers/perf/arm_cspmu/arm_cspmu.c    | 166
> +++++++++++++++++++++-----
> >   drivers/perf/arm_cspmu/arm_cspmu.h    |  24 +++-
> >   drivers/perf/arm_cspmu/nvidia_cspmu.c |  34 +++++-
> >   drivers/perf/arm_cspmu/nvidia_cspmu.h |  17 ---
> >   6 files changed, 199 insertions(+), 57 deletions(-)
> >   delete mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.h
> >
> > diff --git a/drivers/perf/arm_cspmu/Kconfig
> b/drivers/perf/arm_cspmu/Kconfig
> > index 25d25ded0983..d5f787d22234 100644
> > --- a/drivers/perf/arm_cspmu/Kconfig
> > +++ b/drivers/perf/arm_cspmu/Kconfig
> > @@ -1,6 +1,6 @@
> >   # SPDX-License-Identifier: GPL-2.0
> >   #
> > -# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights
> reserved.
> > +# Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights
> reserved.
> >
> >   config ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU
> >       tristate "ARM Coresight Architecture PMU"
> > @@ -10,3 +10,10 @@ config ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU
> >         based on ARM CoreSight PMU architecture. Note that this PMU
> >         architecture does not have relationship with the ARM CoreSight
> >         Self-Hosted Tracing.
> > +
> > +config NVIDIA_CORESIGHT_PMU_ARCH_SYSTEM_PMU
> > +     tristate "NVIDIA Coresight Architecture PMU"
> > +     depends on ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU
> > +     help
> > +       Provides NVIDIA specific attributes for performance monitoring unit
> > +       (PMU) devices based on ARM CoreSight PMU architecture.
> > diff --git a/drivers/perf/arm_cspmu/Makefile
> b/drivers/perf/arm_cspmu/Makefile
> > index fedb17df982d..0309d2ff264a 100644
> > --- a/drivers/perf/arm_cspmu/Makefile
> > +++ b/drivers/perf/arm_cspmu/Makefile
> > @@ -1,6 +1,8 @@
> > -# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights
> reserved.
> > +# Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights
> reserved.
> >   #
> >   # SPDX-License-Identifier: GPL-2.0
> >
> >   obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) +=
> arm_cspmu_module.o
> > -arm_cspmu_module-y := arm_cspmu.o nvidia_cspmu.o
> > +arm_cspmu_module-y := arm_cspmu.o
> > +
> > +obj-$(CONFIG_NVIDIA_CORESIGHT_PMU_ARCH_SYSTEM_PMU) +=
> nvidia_cspmu.o
> > diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c
> b/drivers/perf/arm_cspmu/arm_cspmu.c
> > index e2b7827c4563..04be94b4aa48 100644
> > --- a/drivers/perf/arm_cspmu/arm_cspmu.c
> > +++ b/drivers/perf/arm_cspmu/arm_cspmu.c
> > @@ -16,7 +16,7 @@
> >    * The user should refer to the vendor technical documentation to get
> details
> >    * about the supported events.
> >    *
> > - * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights
> reserved.
> > + * Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights
> reserved.
> >    *
> >    */
> >
> > @@ -26,11 +26,11 @@
> >   #include <linux/interrupt.h>
> >   #include <linux/io-64-nonatomic-lo-hi.h>
> >   #include <linux/module.h>
> > +#include <linux/mutex.h>
> >   #include <linux/perf_event.h>
> >   #include <linux/platform_device.h>
> >
> >   #include "arm_cspmu.h"
> > -#include "nvidia_cspmu.h"
> >
> >   #define PMUNAME "arm_cspmu"
> >   #define DRVNAME "arm-cs-arch-pmu"
> > @@ -112,11 +112,10 @@
> >    */
> >   #define HILOHI_MAX_POLL     1000
> >
> > -/* JEDEC-assigned JEP106 identification code */
> > -#define ARM_CSPMU_IMPL_ID_NVIDIA             0x36B
> > -
> >   static unsigned long arm_cspmu_cpuhp_state;
> >
> > +static DEFINE_MUTEX(arm_cspmu_lock);
> > +
> >   static struct acpi_apmt_node *arm_cspmu_apmt_node(struct device *dev)
> >   {
> >       return *(struct acpi_apmt_node **)dev_get_platdata(dev);
> > @@ -373,27 +372,37 @@ static struct attribute_group
> arm_cspmu_cpumask_attr_group = {
> >       .attrs = arm_cspmu_cpumask_attrs,
> >   };
> >
> > -struct impl_match {
> > -     u32 pmiidr;
> > -     u32 mask;
> > -     int (*impl_init_ops)(struct arm_cspmu *cspmu);
> > -};
> > -
> > -static const struct impl_match impl_match[] = {
> > +static struct arm_cspmu_impl_match impl_match[] = {
> >       {
> > -       .pmiidr = ARM_CSPMU_IMPL_ID_NVIDIA,
> > -       .mask = ARM_CSPMU_PMIIDR_IMPLEMENTER,
> > -       .impl_init_ops = nv_cspmu_init_ops
> > +             .module_name    = "nvidia_cspmu",
> > +             .pmiidr_val     = ARM_CSPMU_IMPL_ID_NVIDIA,
> > +             .pmiidr_mask    = ARM_CSPMU_PMIIDR_IMPLEMENTER,
> > +             .module         = NULL,
> > +             .impl_init_ops  = NULL,
> >       },
> > -     {}
> > +     {0}
> >   };
> >
> > +static struct arm_cspmu_impl_match *arm_cspmu_impl_match_get(u32
> pmiidr)
> > +{
> > +     struct arm_cspmu_impl_match *match = impl_match;
> > +
> > +     for (; match->pmiidr_val; match++) {
> > +             u32 mask = match->pmiidr_mask;
> > +
> > +             if ((match->pmiidr_val & mask) == (pmiidr & mask))
> > +                     break;
> > +     }
> > +
> > +     return match;
> > +}
> > +
> >   static int arm_cspmu_init_impl_ops(struct arm_cspmu *cspmu)
> >   {
> > -     int ret;
> > +     int ret = 0;
> >       struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
> >       struct acpi_apmt_node *apmt_node = arm_cspmu_apmt_node(cspmu-
> >dev);
> > -     const struct impl_match *match = impl_match;
> > +     struct arm_cspmu_impl_match *match;
> >
> >       /*
> >        * Get PMU implementer and product id from APMT node.
> > @@ -405,16 +414,31 @@ static int arm_cspmu_init_impl_ops(struct
> arm_cspmu *cspmu)
> >                                      readl(cspmu->base0 + PMIIDR);
> >
> >       /* Find implementer specific attribute ops. */
> > -     for (; match->pmiidr; match++) {
> > -             const u32 mask = match->mask;
> > +     match = arm_cspmu_impl_match_get(cspmu->impl.pmiidr);
> > +
> > +     /* Load implementer module and initialize the callbacks. */
> > +     if (match) {
> > +             mutex_lock(&arm_cspmu_lock);
> > +
> > +             if (match->impl_init_ops) {
> > +                     if (try_module_get(match->module)) {
> > +                             cspmu->impl.match = match;
> > +                             ret = match->impl_init_ops(cspmu);
> > +                             module_put(match->module);
> > +                     } else {
> > +                             WARN(1, "arm_cspmu failed to get module: %s\n",
> > +                                     match->module_name);
> > +                             ret = -EINVAL;
> > +                     }
> > +             } else {
> > +                     request_module_nowait(match->module_name);
> > +                     ret = -EPROBE_DEFER;
> > +             }
> >
> > -             if ((match->pmiidr & mask) == (cspmu->impl.pmiidr & mask)) {
> > -                     ret = match->impl_init_ops(cspmu);
> > -                     if (ret)
> > -                             return ret;
> > +             mutex_unlock(&arm_cspmu_lock);
> >
> > -                     break;
> > -             }
> > +             if (ret)
> > +                     return ret;
> >       }
> >
> >       /* Use default callbacks if implementer doesn't provide one. */
> > @@ -478,11 +502,6 @@ arm_cspmu_alloc_attr_group(struct arm_cspmu
> *cspmu)
> >       struct attribute_group **attr_groups = NULL;
> >       struct device *dev = cspmu->dev;
> >       const struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
> > -     int ret;
> > -
> > -     ret = arm_cspmu_init_impl_ops(cspmu);
> > -     if (ret)
> > -             return NULL;
> >
> >       cspmu->identifier = impl_ops->get_identifier(cspmu);
> >       cspmu->name = impl_ops->get_name(cspmu);
> > @@ -1129,6 +1148,11 @@ static int arm_cspmu_get_cpus(struct
> arm_cspmu *cspmu)
> >       return arm_cspmu_acpi_get_cpus(cspmu);
> >   }
> >
> > +static inline struct module *arm_cspmu_get_module(struct arm_cspmu
> *cspmu)
> > +{
> > +     return (cspmu->impl.match) ? cspmu->impl.match->module :
> THIS_MODULE;
> > +}
> > +
> >   static int arm_cspmu_register_pmu(struct arm_cspmu *cspmu)
> >   {
> >       int ret, capabilities;
> > @@ -1149,7 +1173,7 @@ static int arm_cspmu_register_pmu(struct
> arm_cspmu *cspmu)
> >
> >       cspmu->pmu = (struct pmu){
> >               .task_ctx_nr    = perf_invalid_context,
> > -             .module         = THIS_MODULE,
> > +             .module         = arm_cspmu_get_module(cspmu),
> 
> We are accessing the module field without the mutex and the
> impl.match->module could be reset ?

Good catch, it will be fixed on v6.

> 
> >               .pmu_enable     = arm_cspmu_enable,
> >               .pmu_disable    = arm_cspmu_disable,
> >               .event_init     = arm_cspmu_event_init,
> > @@ -1196,6 +1220,10 @@ static int arm_cspmu_device_probe(struct
> platform_device *pdev)
> >       if (ret)
> >               return ret;
> >
> > +     ret = arm_cspmu_init_impl_ops(cspmu);
> > +     if (ret)
> > +             return ret;
> > +
> >       ret = arm_cspmu_register_pmu(cspmu);
> >       if (ret)
> >               return ret;
> > @@ -1300,6 +1328,80 @@ static void __exit arm_cspmu_exit(void)
> >       cpuhp_remove_multi_state(arm_cspmu_cpuhp_state);
> >   }
> >
> > +int arm_cspmu_impl_register(const struct arm_cspmu_impl_match
> *impl_match)
> > +{
> > +     struct arm_cspmu_impl_match *match;
> > +     int ret = 0;
> > +
> > +     match = arm_cspmu_impl_match_get(impl_match->pmiidr_val);
> > +
> > +     if (match) {
> > +             mutex_lock(&arm_cspmu_lock);
> > +
> > +             if (!match->impl_init_ops) {
> > +                     match->module = impl_match->module;
> > +                     match->impl_init_ops = impl_match->impl_init_ops;
> > +             } else {
> > +                     /* Broken match table may contain non-unique entries */
> > +                     WARN(1, "arm_cspmu backend already registered for module:
> %s, pmiidr: 0x%x, mask: 0x%x\n",
> > +                             match->module_name,
> > +                             match->pmiidr_val,
> > +                             match->pmiidr_mask);
> > +
> > +                     ret = -EINVAL;
> > +             }
> > +
> > +             mutex_unlock(&arm_cspmu_lock);
> > +
> > +             if (!ret)
> > +                     ret = driver_attach(&arm_cspmu_driver.driver);
> > +     } else {
> > +             pr_err("arm_cspmu reg failed, unable to find a match for pmiidr:
> 0x%x\n",
> > +                     impl_match->pmiidr_val);
> > +
> > +             ret = -EINVAL;
> > +     }
> > +
> > +     return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(arm_cspmu_impl_register);
> > +
> > +static int arm_cspmu_match_device(struct device *dev, const void *match)
> > +{
> > +     struct arm_cspmu *cspmu =
> platform_get_drvdata(to_platform_device(dev));
> > +
> > +     return (cspmu && cspmu->impl.match == match) ? 1 : 0;
> > +}
> > +
> > +void arm_cspmu_impl_unregister(const struct arm_cspmu_impl_match
> *impl_match)
> > +{
> > +     struct device *dev;
> > +     struct arm_cspmu_impl_match *match;
> > +
> > +     match = arm_cspmu_impl_match_get(impl_match->pmiidr_val);
> > +
> > +     WARN_ON(!match);
> > +
> > +     if (match) {
> > +             /* Unbind the driver from all matching backend devices. */
> > +dev_release:
> > +             dev = driver_find_device(&arm_cspmu_driver.driver, NULL,
> > +                     match, arm_cspmu_match_device);
> > +             if (dev) {
> > +                     device_release_driver(dev);
> > +                     goto dev_release;
> > +             }
> 
> minor nit: We could simply do :
> 
> static int arm_cspmu_release_driver(struct device *dev, void *data)
> {
>         struct arm_cspmu *cspmu =
> platform_get_drvdata(to_platform_device(dev));
> 
>         if (cspmu && cspmu->impl.match == match)
>                 device_release_driver(dev);
>         return 0;
> }
> 
>                 ret = driver_for_each_device(&driver, NULL, match,
> arm_csmpu_release_driver);
> 

It doesn’t seem to work for me.
Is it safe to release while iterating via driver_for_each_device ?

Regards,
Besar