[PATCH 0/4] coresight: Add ETR-PERF polling.

Denis Nikitin denik at google.com
Fri May 14 02:02:25 PDT 2021


On Wed, May 5, 2021 at 8:29 AM Mathieu Poirier
<mathieu.poirier at linaro.org> wrote:
>
> On Tue, May 04, 2021 at 11:46:20PM -0700, Denis Nikitin wrote:
> > On Tue, Apr 27, 2021 at 9:04 AM Leo Yan <leo.yan at linaro.org> wrote:
> > >
> > > On Tue, Apr 27, 2021 at 09:47:46AM -0600, Mathieu Poirier wrote:
> > >
> > > [...]
> > >
> > > > > 2) ETR polling ensures that more trace is collected across the entire
> > > > > trace session - seeking to reduce inconsistent capture volumes.
> > > >
> > > > I am not convinced disabling a sink to collect traces while an
> > > > event is active is the right way to go.  To me it will add (more) complexity to
> > > > the coresight subsystem for very little gains, if any.
> > > >
> > > > If I remember correctly Leo brought forward the exact same idea about a year ago
> > > > and after discussion, we all agreed the benefit would not be important enough to
> > > > offset the drawbacks.
> > > >
> > > > As usual I am open to discussion and my opinion is not set in stone.  But as I
> > > > mentioned I worry the feature will increase complexity in the driver and
> > > > produce dubious results.  And we also have to factor in usability which, as
> > > > Al pointed, out will be a problem.
> > >
> > > Just want to remind one thing for ETR polling.  From one perspective,
> > > the ETR polling mode is actually very similar with perf's snapshot
> > > mode.  E.g. we can use specific interval to send USR2 singal to perf
> > > tool to captcure CoreSight trace data, thus it also can record the
> > > trace data continuously.
> > >
> > > I can see a benefit from ETR polling mode is it might introduce less
> > > overhead than perf snapshot mode.  The kernel's mechanism (workqueue
> > > or kernel thread) will be much efficiency than perf's signal handling
> > > + SMP call with IPIs.
> > >
> > > So it's good to firstly understand if perf snapshot mode can meet the
> > > requirement or not.
> >
> > We evaluated the patch on Chrome OS and I can confirm that the quality
> > of AutoFDO profiles greatly improved with the ETR polling.
> > Tested with per-thread and system-wide mode.
> >
> > Without ETR polling the size of the collected ETM data was very
> > inconsistent on the same workload and could vary by a factor of two.
> > This, in turn, affects the quality of the AutoFDO profiles generated from ETM.
> > With ETR polling the data size became pretty stable.
> > Performance evaluation shows a similar consistency in performance gain
> > of AutoFDO optimization.
> > This, I think, supports the idea that data collection right now is sensitive
> > to the process scheduling and can be improved with ETR polling.
> >
> > For the system-wide mode particularly we didn't see any other alternatives
> > to collect data periodically on a long-running workload.
> > We haven't tested snapshot mode though. The idea sounds interesting.
> > But small runtime overhead is crucial for the sampling profiler in the field
> > and if there is a noticeable difference we would incline towards the
> > ETR polling.
>
> Please see if Leo's approach[1], or any kind of extension to the current
> snapshot feature, would be a viable solution.  Reusing or extending code that is
> already there is always a better option.
>
> Thanks,
> Mathieu
>
> [1]. https://lists.linaro.org/pipermail/coresight/2021-April/006254.html
>

Hi Mattieu and Leo,

I did some evaluation of the snapshot mode.

Performance overhead is indeed higher than with ETR polling patch.
Here are some numbers for comparison (measured on browser
Speedometer2 benchmark):
Runtime overhead of ETM tracing with ETR poll period 100ms is less than
0.5%. Snapshot mode gives 2.1%.
With 10ms period I see 4.6% with ETR polling and 22% in snapshot mode.

We could probably utilize the ETM strobing feature and reduce frequency
of data collection but I see a problem when I'm using both.
Within a minute of profiling the ETM generates a reasonable profile size
(with strobing autofdo,preset=9 with period 0x1000 it is up to 20MB).
But then the size grows unproportionally.
With a 4 minute run I got a 6.3GB profile.
I don't see such a problem with the ETR polling patch.

Leo, could you please take a look at this problem?

Thanks,
Denis



More information about the linux-arm-kernel mailing list