[PATCH v10 05/14] mm: multi-gen LRU: groundwork
Andrew Morton
akpm at linux-foundation.org
Tue Apr 26 18:34:48 PDT 2022
On Tue, 26 Apr 2022 19:18:21 -0600 Yu Zhao <yuzhao at google.com> wrote:
> > For example, lru_gen_add_folio() is huge and has 4(?) call sites. This
> > may well produce slower code due to the icache footprint.
> >
> > Experiment: moving lru_gen_del_folio() into mm/vmscan.c shrinks that
> > file's .text from 80612 bytes to 78956.
> >
> > I tend to think that out-of-line regular old C functions should be the
> > default and that the code should be inlined only when a clear benefit
> > is demonstrable, or has at least been seriously thought about.
>
> I can move those functions to vmscan.c if you think it would improve
> performance. I don't have a strong opinion here -- I was able to
> measure the bloat but not the performance impact.
This seems to be more an act of faith than anything else. Unlikely
that any difference will be measurable.
If there is a difference, the inlined version should win on
microbenchmarks because all four copies of the function will be in
cache. But a more realistic, broader test might suffer a slowdown due
to having to move the larger text in more frequently. And inter-build
alignment changes seem to make a larger difference than anything else,
thus confounding measurement attempts.
More information about the linux-arm-kernel
mailing list