[linus:master] [maple_tree] 280b792cac: will-it-scale.per_process_ops 6.0% regression

Mon May 25 00:23:32 PDT 2026

hi, Liam,

On Sat, May 23, 2026 at 11:12:24AM -0400, Liam R. Howlett wrote:

[...]

> > 
> > I appled your patch upon below mainline tip commit when I checked.
> > 6779b50faa562 ("Merge tag 'pci-v7.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci")
> > 
> > then build kernels with attached config.
> > 
> > but found a big regression introduced by your patch.
> > 
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> >   gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-ivb-2ep2/mmap2/will-it-scale
> > 
> > commit:
> >   6779b50faa562 ("Merge tag 'pci-v7.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci")
> >   942596e1c1037 ("maple_tree: Restore old tree layout using new scatter-gather node copy")
> > 
> > 6779b50faa562e6c 942596e1c1037f014638541bce4
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >    7023559           -42.9%    4007886        will-it-scale.48.processes
> >     146323           -42.9%      83497        will-it-scale.per_process_ops
> >    7023559           -42.9%    4007886        will-it-scale.workload
> 
> 
> How many runs did you do?  The same 6 runs with 48 processes on a 48
> thread machine?  My results vary from this dramatically (+13% increase,
> minus the 6% regression so 7% gain over what existed before).

right, on same machine, and with same test. run 8 times for both 6779b50faa562
and 942596e1c1037 (your patch), the data is stable:

6779b50faa562e6cca1aa6a4649a4d764c6c7e28/matrix.json:  "will-it-scale.per_process_ops": [
6779b50faa562e6cca1aa6a4649a4d764c6c7e28/matrix.json-    146187,
6779b50faa562e6cca1aa6a4649a4d764c6c7e28/matrix.json-    146227,
6779b50faa562e6cca1aa6a4649a4d764c6c7e28/matrix.json-    146559,
6779b50faa562e6cca1aa6a4649a4d764c6c7e28/matrix.json-    147168,
6779b50faa562e6cca1aa6a4649a4d764c6c7e28/matrix.json-    146561,
6779b50faa562e6cca1aa6a4649a4d764c6c7e28/matrix.json-    145606,
6779b50faa562e6cca1aa6a4649a4d764c6c7e28/matrix.json-    146257,
6779b50faa562e6cca1aa6a4649a4d764c6c7e28/matrix.json-    146025
6779b50faa562e6cca1aa6a4649a4d764c6c7e28/matrix.json-  ],

942596e1c1037f014638541bce448b5926be4fb6/matrix.json:  "will-it-scale.per_process_ops": [
942596e1c1037f014638541bce448b5926be4fb6/matrix.json-    83600,
942596e1c1037f014638541bce448b5926be4fb6/matrix.json-    83477,
942596e1c1037f014638541bce448b5926be4fb6/matrix.json-    83510,
942596e1c1037f014638541bce448b5926be4fb6/matrix.json-    83335,
942596e1c1037f014638541bce448b5926be4fb6/matrix.json-    83700,
942596e1c1037f014638541bce448b5926be4fb6/matrix.json-    83361,
942596e1c1037f014638541bce448b5926be4fb6/matrix.json-    83761,
942596e1c1037f014638541bce448b5926be4fb6/matrix.json-    83233
942596e1c1037f014638541bce448b5926be4fb6/matrix.json-  ],

> 
> > 
> > 
> > full comparison is as below [1]
> > 
> > however, it seems the performance regression is really recovered at commit
> > 6779b50faa562, though the configs are not same.
> 
> I do not understand your statement here.
> 
> Are you saying that the current code isn't producing the regression
> after 6779b50faa562?  Because 6779b50faa562 has nothing to do with my
> code, so I don't understand why that affects the results at all.

this is kind of out of our report criteria. when our bisect point to a 'fbc'
in mainline which causing performance regression comparing to its parent, we
will further check mainline and linux-next/master tip to see if the data is
still similar to 'fbc', i.e. if the regression is still persistent in latest
code. so in our original report, you could see statements as below:

[still regression on linus/master      5d6919055dec134de3c40167a490f33c74c12581]
[still regression on linux-next/master e98d21c170b01ddef366f023bbfcf6b31509fa83]

in order to avoid wrong report as much as possible, if the regression didn't
exist on mainline or linux-next/master tip, we wouldn't report at all.

since you said to apply your patch to latest mainline, 6779b50faa562 when I
check, we tested the performance on 6779b50faa562 as well, and found it's data
is almost similar to 11e7f22f5e (parent of 280b792cac), that's the reason I
say we cannot see regression on latest mainline for now.

> 
> > list the regression we reported
> > for refererence.
> 
> Are you listing the regression in the applied patch for reference, or
> was the initial report for reference?

I listed both.

below is from what you requested us to test.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-ivb-2ep2/mmap2/will-it-scale

commit:
  6779b50faa562 ("Merge tag 'pci-v7.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci")
  942596e1c1037 ("maple_tree: Restore old tree layout using new scatter-gather node copy")

6779b50faa562e6c 942596e1c1037f014638541bce4
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   7023559           -42.9%    4007886        will-it-scale.48.processes
    146323           -42.9%      83497        will-it-scale.per_process_ops
   7023559           -42.9%    4007886        will-it-scale.workload

below is just from original report that I think maybe easier to compare with
above.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-ivb-2ep2/mmap2/will-it-scale

commit: 
  11e7f22f5e ("maple_tree: add cp_converged() helper")
  280b792cac ("maple_tree: use maple copy node for mas_wr_split()")

11e7f22f5e85058b 280b792cac62ddadca2935766ca 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   6885401            -6.0%    6472359        will-it-scale.48.processes
    143445            -6.0%     134840        will-it-scale.per_process_ops
   6885401            -6.0%    6472359        will-it-scale.workload

> 
> This also means that I don't need to work on this now, right?  Because
> there is no regression?

just from our test results perspective, you are right.

> I'd like to clarify that before I spend more
> time trying to fix this, especially since I saw nether of these
> regressions in my testing - that is, I cannot reproduce your findings in
> either code base.

[...]

> 
> >       0.00           +14.8       14.81        perf-profile.calltrace.cycles-pp.mas_wr_split.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap
> >       8.17            -8.2        0.00        perf-profile.children.cycles-pp.mas_wr_node_store
> 
> It looks like the slow path is taken 100% of the time in your run, for
> all processes, always.
> 
> I understand why that is happening and I can work to avoid it (nudging
> the write deeper into a split node), but I'd like clarification of my
> questions above before I keep going here.

sorry that we cannot supply deep enough technical analysis such like why
mainline tip (at least 6779b50faa562) has no regression already.

but in case you want to dig this deep, you could request us to test any
fix/debug patch, whatever upon your last patch, current mainline tip, or
280b792cac. we just want to be able to supply any assistant to improve
linux kernel code quality. thanks

> 
> ...
> 
> Thanks,
> Liam
>