Benchmarking [PATCH v5 00/14] SLUB percpu sheaves

Liam R. Howlett Liam.Howlett at oracle.com
Thu Sep 18 08:29:14 PDT 2025


* Uladzislau Rezki <urezki at gmail.com> [250918 07:50]:
> On Wed, Sep 17, 2025 at 04:59:41PM -0700, Suren Baghdasaryan wrote:
> > On Wed, Sep 17, 2025 at 9:14 AM Suren Baghdasaryan <surenb at google.com> wrote:
> > >
> > > On Tue, Sep 16, 2025 at 10:19 PM Uladzislau Rezki <urezki at gmail.com> wrote:
> > > >
> > > > On Tue, Sep 16, 2025 at 10:09:18AM -0700, Suren Baghdasaryan wrote:
> > > > > On Mon, Sep 15, 2025 at 8:22 AM Vlastimil Babka <vbabka at suse.cz> wrote:
> > > > > >
> > > > > > On 9/15/25 14:13, Paul E. McKenney wrote:
> > > > > > > On Mon, Sep 15, 2025 at 09:51:25AM +0200, Jan Engelhardt wrote:
> > > > > > >>
> > > > > > >> On Saturday 2025-09-13 02:09, Sudarsan Mahendran wrote:
> > > > > > >> >
> > > > > > >> >Summary of the results:
> > > > > >
> > > > > > In any case, thanks a lot for the results!
> > > > > >
> > > > > > >> >- Significant change (meaning >10% difference
> > > > > > >> >  between base and experiment) on will-it-scale
> > > > > > >> >  tests in AMD.
> > > > > > >> >
> > > > > > >> >Summary of AMD will-it-scale test changes:
> > > > > > >> >
> > > > > > >> >Number of runs : 15
> > > > > > >> >Direction      : + is good
> > > > > > >>
> > > > > > >> If STDDEV grows more than mean, there is more jitter,
> > > > > > >> which is not "good".
> > > > > > >
> > > > > > > This is true.  On the other hand, the mean grew way more in absolute
> > > > > > > terms than did STDDEV.  So might this be a reasonable tradeoff?
> > > > > >
> > > > > > Also I'd point out that MIN of TEST is better than MAX of BASE, which means
> > > > > > there's always an improvement for this config. So jitter here means it's
> > > > > > changing between better and more better :) and not between worse and (more)
> > > > > > better.
> > > > > >
> > > > > > The annoying part of course is that for other configs it's consistently the
> > > > > > opposite.
> > > > >
> > > > > Hi Vlastimil,
> > > > > I ran my mmap stress test that runs 20000 cycles of mmapping 50 VMAs,
> > > > > faulting them in then unmapping and timing only mmap and munmap calls.
> > > > > This is not a realistic scenario but works well for A/B comparison.
> > > > >
> > > > > The numbers are below with sheaves showing a clear improvement:
> > > > >
> > > > > Baseline
> > > > >             avg             stdev
> > > > > mmap        2.621073        0.2525161631
> > > > > munmap      2.292965        0.008831973052
> > > > > total       4.914038        0.2572620923
> > > > >
> > > > > Sheaves
> > > > >             avg            stdev           avg_diff        stdev_diff
> > > > > mmap        1.561220667    0.07748897037   -40.44%        -69.31%
> > > > > munmap      2.042071       0.03603083448   -10.94%        307.96%
> > > > > total       3.603291667    0.113209047     -26.67%        -55.99%
> > > > >
> > > > Could you run your test with dropping below patch?
> > >
> > > Sure, will try later today and report.
> > 
> > Sheaves with [04/23] patch reverted:
> > 
> >             avg             avg_diff
> > mmap     2.143948        -18.20%
> > munmap     2.343707        2.21%
> > total     4.487655        -8.68%
> > 
> With offloading over sheaves the mmap/munmap is faster, i assume it is
> because of same objects are reused from the sheaves after reclaim. Whereas we,
> kvfree_rcu() just free them.

Sorry, I am having trouble following where you think the speed up is
coming from.

Can you clarify what you mean by offloading and reclaim in this context?

Thanks,
Liam




More information about the maple-tree mailing list