[PATCH V1] perf: qcom: Add L3 cache PMU driver

Mon Mar 21 09:06:57 PDT 2016

On 2016-03-21 06:35, Mark Rutland wrote:
> On Fri, Mar 18, 2016 at 04:37:02PM -0400, Agustin Vega-Frias wrote:
>> This adds a new dynamic PMU to the Perf Events framework to program
>> and control the L3 cache PMUs in some Qualcomm Technologies SOCs.
>> 
>> The driver supports a distributed cache architecture where the overall
>> cache is comprised of multiple slices each with its own PMU. The 
>> driver
>> aggregates counts across the whole system to provide a global picture
>> of the metrics selected by the user.
>> 
>> The driver exports formatting and event information to sysfs so it can
>> be used by the perf user space tools with the syntaxes:
>>    perf stat -a -e l3cache/read-miss/
>>    perf stat -a -e l3cache/event=0x21/
>> 
>> Signed-off-by: Agustin Vega-Frias <agustinv at codeaurora.org>
>> ---
>>  arch/arm64/kernel/Makefile                   |   4 +
>>  arch/arm64/kernel/perf_event_qcom_l3_cache.c | 816 
>> +++++++++++++++++++++++++++
>>  2 files changed, 820 insertions(+)
>>  create mode 100644 arch/arm64/kernel/perf_event_qcom_l3_cache.c
> 
> Move this to drivers/bus (where the CCI and CCN PMU drivers live), or
> drivers/perf (where some common infrastructure lives).
> 
> This isn't architectural, and isn't CPU-specific, so it has no reason 
> to
> live in arch/arm64.
> 

I will move it to drivers/perf. Thanks.

>> +static
>> +int qcom_l3_cache__event_init(struct perf_event *event)
>> +{
>> +	struct hw_perf_event *hwc = &event->hw;
>> +
>> +	if (event->attr.type != l3cache_pmu.pmu.type)
>> +		return -ENOENT;
>> +
>> +	/*
>> +	 * There are no per-counter mode filters in the PMU.
>> +	 */
>> +	if (event->attr.exclude_user || event->attr.exclude_kernel ||
>> +	    event->attr.exclude_hv || event->attr.exclude_idle)
>> +		return -EINVAL;
>> +
>> +	hwc->idx = -1;
>> +
>> +	/*
>> +	 * Sampling not supported since these events are not 
>> core-attributable.
>> +	 */
>> +	if (hwc->sample_period)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * Task mode not available, we run the counters as system counters,
>> +	 * not attributable to any CPU and therefore cannot attribute 
>> per-task.
>> +	 */
>> +	if (event->cpu < 0)
>> +		return -EINVAL;
>> +
>> +	return 0;
>> +}
> 
> Please follow what the other system PMUs do and (forcefully) ensure 
> that all
> events share the same CPU in event_init.
> 
> Otherwise, events can exist in multiple percpu contexts, and management
> thereof can race on things line pmu_{enable,disable}.
> 
> You'll also want to expose a cpumask to userspace for that, to ensure 
> that it
> only opens events on a single CPU.
> 
> For an example, see drivers/bus/arm-ccn.c. That also handles migrating
> events to a new CPU upon hotplug.

Understood. I will look at the CCN implementation as reference,

> 
>> +static
>> +int qcom_l3_cache__event_add(struct perf_event *event, int flags)
>> +{
>> +	struct l3cache_pmu *system = to_l3cache_pmu(event->pmu);
>> +	struct hw_perf_event *hwc = &event->hw;
>> +	int idx;
>> +	int prev_cpu;
>> +	int err = 0;
>> +
>> +	/*
>> +	 * We need to disable the pmu while adding the event, otherwise
>> +	 * the perf tick might kick-in and re-add this event.
>> +	 */
>> +	perf_pmu_disable(event->pmu);
>> +
>> +	/*
>> +	 * This ensures all events are on the same CPU context. No need to 
>> open
>> +	 * these on all CPUs since they are system events. The strategy here 
>> is
>> +	 * to set system->cpu when the first event is created and from that
>> +	 * point on, only events in the same CPU context will be allowed to 
>> be
>> +	 * active. system->cpu will get reset back to -1 when the last event
>> +	 * is deleted, please see qcom_l3_cache__event_del below.
>> +	 */
>> +	prev_cpu = atomic_cmpxchg(&system->cpu, -1, event->cpu);
>> +	if ((event->cpu != prev_cpu) && (prev_cpu != -1)) {
>> +		err = -EAGAIN;
>> +		goto out;
>> +	}
> 
> As above, handle this in event_init, as other system PMUs do.
> 
> Handling it here is rather late, and unnecessarily fragile.

Understood. I will look at the CCN implementation as reference,

> 
>> +/*
>> + * In some platforms interrupt resources might not come directly from 
>> the GIC,
>> + * but from separate IRQ circuitry that signals a summary IRQ to the 
>> GIC and
>> + * is handled by a secondary IRQ controller. This is problematic 
>> under ACPI boot
>> + * because the ACPI core does not use the Resource Source field on 
>> the Extended
>> + * Interrupt Descriptor, which in theory could be used to specify an 
>> alternative
>> + * IRQ controller.
>> +
>> + * For this reason in these platforms we implement the secondary IRQ 
>> controller
>> + * using the gpiolib and specify the IRQs as GpioInt resources, so 
>> when getting
>> + * an IRQ from the device we first try platform_get_irq and if it 
>> fails we try
>> + * devm_gpiod_get_index/gpiod_to_irq.
>> + */
>> +static
>> +int qcom_l3_cache_pmu_get_slice_irq(struct platform_device *pdev,
>> +				    struct platform_device *sdev)
>> +{
>> +	int virq = platform_get_irq(sdev, 0);
>> +	struct gpio_desc *desc;
>> +
>> +	if (virq >= 0)
>> +		return virq;
>> +
>> +	desc = devm_gpiod_get_index(&sdev->dev, NULL, 0, GPIOD_ASIS);
>> +	if (IS_ERR(desc))
>> +		return -ENOENT;
>> +
>> +	return gpiod_to_irq(desc);
>> +}
>> +
> 
> As Marc Zyngier pointed out in another thread, you should represent 
> your
> interrupt controller as an interrupt controller rather than pretending 
> it is a
> GPIO controller.
> 
> Drivers should be able to remain blissfully unaware what the other end 
> of their
> interrupt line is wired up to, and shouldn't have to jump through hoops 
> like
> the above.

Understood. We need to close the loop with Rafael J. Wysocki w.r.t. ACPI 
support for multiple regular IRQ controllers similar to DT.

> 
> Thanks,
> Mark.

Thanks,
Agustin