[PATCH v1] perf sample: Make user_regs and intr_regs optional
Namhyung Kim
namhyung at kernel.org
Mon Feb 10 18:50:57 PST 2025
On Mon, Feb 10, 2025 at 10:15:22AM -0800, Ian Rogers wrote:
> On Mon, Jan 13, 2025 at 11:43 AM Ian Rogers <irogers at google.com> wrote:
> >
> > The struct dump_regs contains 512 bytes of cache_regs, meaning the two
> > values in perf_sample contribute 1088 bytes of its total 1384 bytes
> > size. Initializing this much memory has a cost reported by Tavian
> > Barnes <tavianator at tavianator.com> as about 2.5% when running `perf
> > script --itrace=i0`:
> > https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/
> >
> > Adrian Hunter <adrian.hunter at intel.com> replied that the zero
> > initialization was necessary and couldn't simply be removed.
> >
> > This patch aims to strike a middle ground of still zeroing the
> > perf_sample, but removing 79% of its size by make user_regs and
> > intr_regs optional pointers to zalloc-ed memory. To support the
> > allocation accessors are created for user_regs and intr_regs. To
> > support correct cleanup perf_sample__init and perf_sample__exit
> > functions are created and added throughout the code base.
>
> Ping. Given the memory savings and performance wins it would be nice
> to see this land. Andi Kleen commented on doing a reimplementation,
> which is fine but out-of-scope of what I'm doing here.
Yeah, I like the core of the change. Andi's concern is that it touches
too many places. It'd be nice if we can do that without allocating
memory for regs and eliminating the perf_sample__{init,exit}. But I'm
not if it's possible.
Thanks,
Namhyung
More information about the linux-arm-kernel
mailing list