[PATCH v2] perf: arm_cspmu: Separate Arm and vendor module

Suzuki K Poulose suzuki.poulose at arm.com
Wed Apr 19 14:31:58 PDT 2023


On 18/04/2023 20:33, Besar Wicaksono wrote:
> Hi Suzuki,
> 
>> -----Original Message-----
>> From: Suzuki K Poulose <suzuki.poulose at arm.com>
>> Sent: Tuesday, April 18, 2023 6:07 AM
>> To: Besar Wicaksono <bwicaksono at nvidia.com>; catalin.marinas at arm.com;
>> will at kernel.org; mark.rutland at arm.com
>> Cc: linux-arm-kernel at lists.infradead.org; linux-kernel at vger.kernel.org; linux-
>> tegra at vger.kernel.org; Thierry Reding <treding at nvidia.com>; Jonathan
>> Hunter <jonathanh at nvidia.com>; Vikram Sethi <vsethi at nvidia.com>; Richard
>> Wiley <rwiley at nvidia.com>; Eric Funsten <efunsten at nvidia.com>
>> Subject: Re: [PATCH v2] perf: arm_cspmu: Separate Arm and vendor module
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 18/04/2023 07:20, Besar Wicaksono wrote:
>>> Arm Coresight PMU driver consists of main standard code and vendor
>>> backend code. Both are currently built as a single module.
>>> This patch adds vendor registration API to separate the two to
>>> keep things modular. Vendor module shall register to the main
>>> module on loading and trigger device reprobe.
>>>
>>> Signed-off-by: Besar Wicaksono <bwicaksono at nvidia.com>
>>> ---
>>>
>>> Changes from v1:
>>>    * Added separate Kconfig entry for nvidia backend
>>>    * Added lock to protect accesses to the lists
>>>    * Added support for matching subset devices from a vendor
>>>    * Added state tracking to avoid reprobe when a device is in use
>>> v1: ttps://lore.kernel.org/linux-arm-kernel/20230403163905.20354-1-
>> bwicaksono at nvidia.com/T/#u
>>>
>>> ---
>>>    drivers/perf/arm_cspmu/Kconfig        |   9 +-
>>>    drivers/perf/arm_cspmu/Makefile       |   6 +-
>>>    drivers/perf/arm_cspmu/arm_cspmu.c    | 280
>> +++++++++++++++++++++++---
>>>    drivers/perf/arm_cspmu/arm_cspmu.h    |  32 ++-
>>>    drivers/perf/arm_cspmu/nvidia_cspmu.c |  39 +++-
>>>    drivers/perf/arm_cspmu/nvidia_cspmu.h |  17 --
>>>    6 files changed, 325 insertions(+), 58 deletions(-)
>>>    delete mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.h
>>>
>>> diff --git a/drivers/perf/arm_cspmu/Kconfig
>> b/drivers/perf/arm_cspmu/Kconfig
>>> index 0b316fe69a45..8ce7b45a0075 100644
>>> --- a/drivers/perf/arm_cspmu/Kconfig
>>> +++ b/drivers/perf/arm_cspmu/Kconfig
>>> @@ -1,6 +1,6 @@
>>>    # SPDX-License-Identifier: GPL-2.0
>>>    #
>>> -# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights
>> reserved.
>>> +# Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights
>> reserved.
>>>
>>>    config ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU
>>>        tristate "ARM Coresight Architecture PMU"
>>> @@ -11,3 +11,10 @@ config ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU
>>>          based on ARM CoreSight PMU architecture. Note that this PMU
>>>          architecture does not have relationship with the ARM CoreSight
>>>          Self-Hosted Tracing.
>>> +
>>> +config NVIDIA_CORESIGHT_PMU_ARCH_SYSTEM_PMU
>>> +     tristate "NVIDIA Coresight Architecture PMU"
>>> +     depends on ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU
>>> +     help
>>> +       Provides NVIDIA specific attributes for performance monitoring unit
>>> +       (PMU) devices based on ARM CoreSight PMU architecture.
>>> diff --git a/drivers/perf/arm_cspmu/Makefile
>> b/drivers/perf/arm_cspmu/Makefile
>>> index fedb17df982d..f8ae22411d59 100644
>>> --- a/drivers/perf/arm_cspmu/Makefile
>>> +++ b/drivers/perf/arm_cspmu/Makefile
>>> @@ -1,6 +1,6 @@
>>> -# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights
>> reserved.
>>> +# Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights
>> reserved.
>>>    #
>>>    # SPDX-License-Identifier: GPL-2.0
>>>
>>> -obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) +=
>> arm_cspmu_module.o
>>> -arm_cspmu_module-y := arm_cspmu.o nvidia_cspmu.o
>>> +obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) +=
>> arm_cspmu.o
>>> +obj-$(CONFIG_NVIDIA_CORESIGHT_PMU_ARCH_SYSTEM_PMU) +=
>> nvidia_cspmu.o
>>> diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c
>> b/drivers/perf/arm_cspmu/arm_cspmu.c
>>> index e31302ab7e37..c55ea2b74454 100644
>>> --- a/drivers/perf/arm_cspmu/arm_cspmu.c
>>> +++ b/drivers/perf/arm_cspmu/arm_cspmu.c
>>> @@ -16,7 +16,7 @@
>>>     * The user should refer to the vendor technical documentation to get
>> details
>>>     * about the supported events.
>>>     *
>>> - * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights
>> reserved.
>>> + * Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights
>> reserved.
>>>     *
>>>     */
>>>
>>> @@ -25,13 +25,14 @@
>>>    #include <linux/ctype.h>
>>>    #include <linux/interrupt.h>
>>>    #include <linux/io-64-nonatomic-lo-hi.h>
>>> +#include <linux/list.h>
>>>    #include <linux/module.h>
>>> +#include <linux/mutex.h>
>>>    #include <linux/perf_event.h>
>>>    #include <linux/platform_device.h>
>>>    #include <acpi/processor.h>
>>>
>>>    #include "arm_cspmu.h"
>>> -#include "nvidia_cspmu.h"
>>>
>>>    #define PMUNAME "arm_cspmu"
>>>    #define DRVNAME "arm-cs-arch-pmu"
>>> @@ -117,11 +118,52 @@
>>>     */
>>>    #define HILOHI_MAX_POLL     1000
>>>
>>> -/* JEDEC-assigned JEP106 identification code */
>>> -#define ARM_CSPMU_IMPL_ID_NVIDIA             0x36B
>>> -
>>>    static unsigned long arm_cspmu_cpuhp_state;
>>>
>>> +/* List of Coresight PMU instances in the system. */
>>> +static LIST_HEAD(arm_cspmus);
>>> +
>>> +/* List of registered vendor backends. */
>>> +static LIST_HEAD(arm_cspmu_impls);
>>> +
>>> +static DEFINE_MUTEX(arm_cspmu_lock);
>>> +
>>> +/*
>>> + * State of the generic driver.
>>> + * 0 => registering backend.
>>> + * 1 => ready to use.
>>> + * 2 or more => in use.
>>> + */
>>> +#define ARM_CSPMU_STATE_REG  0
>>> +#define ARM_CSPMU_STATE_READY        1
>>> +static atomic_t arm_cspmu_state;
>>> +
>>> +static void arm_cspmu_state_ready(void)
>>> +{
>>> +     atomic_set(&arm_cspmu_state, ARM_CSPMU_STATE_READY);
>>> +}
>>> +
>>> +static bool try_arm_cspmu_state_reg(void)
>>> +{
>>> +     const int old = ARM_CSPMU_STATE_READY;
>>> +     const int new = ARM_CSPMU_STATE_REG;
>>> +
>>> +     return atomic_cmpxchg(&arm_cspmu_state, old, new) == old;
>>> +}
>>> +
>>> +static bool try_arm_cspmu_state_get(void)
>>> +{
>>> +     return atomic_inc_not_zero(&arm_cspmu_state);
>>> +}
>>> +
>>> +static void arm_cspmu_state_put(void)
>>> +{
>>> +     int ret;
>>> +
>>> +     ret = atomic_dec_if_positive(&arm_cspmu_state);
>>> +     WARN_ON(ret < 0);
>>> +}
>>> +
>>
>> As long as the vendor module is set for the PMU instance, it won't be
>> unloaded as long as there are any perf events and thus the specific
>> driver cannot be unloaded. So, you don't need explicit refcount
>> maintenance for each pmu callbacks.
>>
> 
> The arm_cspmu_state mainly protects during new backend registration when
> the device is attached to standard implementation. All devices are attached to
> standard implementation initially when arm_cspmu module is loaded, since there
> is no backend yet. On backend registration, the standard impl is replaced by
> backend impl. However, the module unloading mechanism doesn't provide
> protection because standard impl is owned by arm_cspmu module, which
> is not unloaded during registration.
> 
> The refcount may not be required if the devices are not attached to standard
> Implementation by default, and the unreg doesn't fallback to it. But that makes
> the devices usable only when there is a vendor backend available.

Ok, thanks for the explanation. But I still think we :

  - Don't need a single global refcount for all the PMUs. Instead this
    could be per PMU instance (arm_cspmu), only when it is backed by
    "generic" backend, others get module refcount. If the refcount of
    "generic" PMU is positive, during the registration of a matching
    backend driver, we could simply mark that as pending reprobe.

  - And only do the refcount for the following call backs:

   pmu:: event_init -> hold the refcount
   pmu:: destroy -> drop the refcount and trigger a reprobe if one was
         pending (see above)

This would allow loading (unrelated) modules even when other PMUs are
active.

>>> +static bool impl_param_match(const struct arm_cspmu_impl_param *A,
>>> +                          const struct arm_cspmu_impl_param *B)
>>> +{
>>> +     /*
>>> +      * Match criteria:
>>> +      * - Implementer id should match.
>>> +      * - A's device id is within B's range, or vice versa. This allows
>>> +      *   vendor to register backend for a range of devices.
>>> +      */
>>> +     if ((A->impl_id == B->impl_id) &&
>>> +         (((A->pvr & A->pvr_mask) == (B->pvr & A->pvr_mask)) ||
>>> +          ((A->pvr & B->pvr_mask) == (B->pvr & B->pvr_mask))))
>>> +             return true;
>>> +
>>
>> nit: Please do not use CAPITAL letters for variable names. Could this
>> simply accept a pmiidr and a impl_match and match the fields with that
>> of the mask/value pair. See more below.
>>
> 
> The bitfield comparison in impl_param_match can be reused for different purposes:
> 1. Test between two mask/value pairs to check if device id ranges from
>       both pairs are intersecting. We use it during validation in (un)registration.
> 2. Along with to_impl_param, we can compare a pmiidr value with a
>       mask/value pair to check if a device is compatible with a backend. We use
>       it during reprobe and initializing impl_ops.

Ok.

Suzuki



More information about the linux-arm-kernel mailing list