[PATCH v10 05/14] mm: multi-gen LRU: groundwork

Andrew Morton akpm at linux-foundation.org
Tue Apr 26 18:34:48 PDT 2022


On Tue, 26 Apr 2022 19:18:21 -0600 Yu Zhao <yuzhao at google.com> wrote:

> > For example, lru_gen_add_folio() is huge and has 4(?) call sites.  This
> > may well produce slower code due to the icache footprint.
> >
> > Experiment: moving lru_gen_del_folio() into mm/vmscan.c shrinks that
> > file's .text from 80612 bytes to 78956.
> >
> > I tend to think that out-of-line regular old C functions should be the
> > default and that the code should be inlined only when a clear benefit
> > is demonstrable, or has at least been seriously thought about.
> 
> I can move those functions to vmscan.c if you think it would improve
> performance. I don't have a strong opinion here -- I was able to
> measure the bloat but not the performance impact.

This seems to be more an act of faith than anything else.  Unlikely
that any difference will be measurable.

If there is a difference, the inlined version should win on
microbenchmarks because all four copies of the function will be in
cache.  But a more realistic, broader test might suffer a slowdown due
to having to move the larger text in more frequently.  And inter-build
alignment changes seem to make a larger difference than anything else,
thus confounding measurement attempts.



More information about the linux-arm-kernel mailing list