[PATCH V1] perf: qcom: Add L3 cache PMU driver

agustinv at codeaurora.org agustinv at codeaurora.org
Mon Mar 21 08:56:59 PDT 2016


On 2016-03-21 05:04, Peter Zijlstra wrote:
> On Fri, Mar 18, 2016 at 04:37:02PM -0400, Agustin Vega-Frias wrote:
>> This adds a new dynamic PMU to the Perf Events framework to program
>> and control the L3 cache PMUs in some Qualcomm Technologies SOCs.
>> 
>> The driver supports a distributed cache architecture where the overall
>> cache is comprised of multiple slices each with its own PMU. The 
>> driver
>> aggregates counts across the whole system to provide a global picture
>> of the metrics selected by the user.
> 
> So is there never a situation where you want to profile just a single
> slice?

No, access to each individual slice is determined by hashing based on 
the target address.

> 
> It userspace at all aware of these slices through other means?

Userspace is not aware of the actual topology.

> 
> That is; typically we do not aggregate in-kernel like this but simply
> expose each slice as a separate PMU and let userspace sort things.

My decision of single vs. multiple PMUs was based on reducing the 
overhead required of retrieving the system-wide counts, which would 
require multiple system calls in the multiple-PMU case.



More information about the linux-arm-kernel mailing list