[PATCH v3] mm: introduce reference pages

Sat Jun 19 02:20:40 PDT 2021

[Apologies for the delay in getting back to you; other work ended up
taking priority and now I'm back to looking at this.]

On Tue, Aug 18, 2020 at 11:25 AM John Hubbard <jhubbard at nvidia.com> wrote:
>
> On 8/17/20 8:00 PM, Matthew Wilcox wrote:
> > On Mon, Aug 17, 2020 at 07:31:39PM -0700, John Hubbard wrote:
> >>>             Real time (s)    Max RSS (KiB)
> >>> anon        2.237081         107088
> >>> memset      2.252241         112180
> >>> refpage     2.243786         107128
> >>>
> >>> We can see that RSS for refpage is almost the same as anon, and real
> >>> time overhead is 44% that of memset.
> >>>
> >>
> >> Are some of the numbers stale, maybe? Try as I might, I cannot combine
> >> anything above to come up with 44%. :)
> >
> > You're not trying hard enough ;-)
> >
> > (2.252241 - 2.237081) / 2.237081 = .00677668801442594166
> > (2.243786 - 2.237081) / 2.237081 = .00299720930981041812
> > .00299720930981041812 / .00677668801442594166 = .44228232189973614648
> >
> > tadaa!
>
> haha, OK then! :) Next time I may try harder, but on the other hand my
> interpretation of the results is still "this is a small effect", even
> if there is a way to make it sound large by comparing the 3rd significant
> digits of the results...
>
> >
> > As I said last time this was posted, I'm just not excited by this.  We go
> > from having a 0.68% time overhead down to an 0.30% overhead, which just
> > doesn't move the needle for me.  Maybe there's a better benchmark than
> > this to show benefits from this patchset.
> >
>

Remember that this is a "realistic" benchmark, so it's doing plenty of
other work besides faulting pages. So I don't think we should expect
to see a massive improvement here.

I ran the pdfium benchmark again but I couldn't see the same
improvements that I got last time. This seems to be because pdfium has
since switched to its own allocator, bypassing the system allocator. I
think the gains should be larger with the memset optimization that
I've implemented, but I'm still in the process of finding a suitable
realistic benchmark that uses the system allocator.

But I would find a 0.4% perf improvement convincing enough,
personally, given that the workload is realistic. Consider a certain
large company which spends $billions annually on data centers. In that
environment a 0.4% performance improvement on realistic workloads can
translate to $millions of savings. And that's not taking into account
the memory savings which are important both in mobile environments and
in data centers.

> Yes, I wonder if there is an artificial workload that just uses refpages
> really extensively, maybe we can get some good solid improvements shown
> with that? Otherwise, it seems like we've just learned that memset is
> actually pretty good in this case. :)

Yes, it's possible to see the performance improvement here more
clearly with a microbenchmark. I've updated the commit message in v4
to include a microbenchmark program and some performance numbers from
it.

Peter