[linus:master] [maple_tree] 280b792cac: will-it-scale.per_process_ops 6.0% regression

Liam R. Howlett liam at infradead.org
Sat May 23 08:12:24 PDT 2026


On 26/05/23 03:51PM, Oliver Sang wrote:
> hi, Liam,
> 
> On Thu, May 21, 2026 at 11:45:05AM -0400, Liam R. Howlett wrote:
> > On 26/05/14 03:18PM, Oliver Sang wrote:
> > > hi, Liam,
> > > 
> > > On Wed, May 13, 2026 at 08:16:42PM -0400, Liam R. Howlett wrote:
> > > > On 26/05/13 03:40PM, kernel test robot wrote:
> > > > > 
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > kernel test robot noticed a 6.0% regression of will-it-scale.per_process_ops on:
> > > > > 
> > > > > 
> > > > > commit: 280b792cac62ddadca2935766ca870b438c86323 ("maple_tree: use maple copy node for mas_wr_split()")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > > 
> > > > > [still regression on linus/master      5d6919055dec134de3c40167a490f33c74c12581]
> > > > > [still regression on linux-next/master e98d21c170b01ddef366f023bbfcf6b31509fa83]
> > > > > 
> > > > > testcase: will-it-scale
> > > > > config: x86_64-rhel-9.4
> > > > > compiler: gcc-14
> > > > > test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> > > > > parameters:
> > > > > 
> > > > > 	nr_task: 100%
> > > > > 	mode: process
> > > > > 	test: mmap2
> > > > > 	cpufreq_governor: performance
> > > > > 
> > > > > 
> > > > 
> > > > Thank you for the report.
> > > > 
> > > > 48 threads on a 2 socket E5-2697 v2 looks to be 12 cores (24 threads)
> > > > per cpu (so x2), or exactly one mmap2 process per hyperthread.
> > > 
> > > this is the cpu information:
> > > 
> > > Architecture:        x86_64
> > > CPU op-mode(s):      32-bit, 64-bit
> > > Byte Order:          Little Endian
> > > Address sizes:       46 bits physical, 48 bits virtual
> > > CPU(s):              48
> > > On-line CPU(s) list: 0-47
> > > Thread(s) per core:  2
> > > Core(s) per socket:  12
> > > Socket(s):           2
> > > NUMA node(s):        2
> > > Vendor ID:           GenuineIntel
> > > CPU family:          6
> > > Model:               62
> > > Model name:          Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
> > > Stepping:            4
> > > 
> > > > 
> > > > Is this across all process counts and peaks at 48, or just 48?
> > > 
> > > just 48. the run script is in
> > > https://download.01.org/0day-ci/archive/20260513/202605131554.92e7df6b-lkp@intel.com/repro-script
> > > 
> > > cd /lkp/benchmarks/will-it-scale
> > > python3 ./runtest.py mmap2 295 process 0 0 48
> > > 
> > > > Is this across many runs?
> > > 
> > > we run 6 times for both parent and this commit, the data looks stable
> > > 
> > > 11e7f22f5e85058b09ca90e74002a3b82f50e940/matrix.json:  "will-it-scale.per_process_ops": [
> > > 11e7f22f5e85058b09ca90e74002a3b82f50e940/matrix.json-    143595,
> > > 11e7f22f5e85058b09ca90e74002a3b82f50e940/matrix.json-    143474,
> > > 11e7f22f5e85058b09ca90e74002a3b82f50e940/matrix.json-    144104,
> > > 11e7f22f5e85058b09ca90e74002a3b82f50e940/matrix.json-    142796,
> > > 11e7f22f5e85058b09ca90e74002a3b82f50e940/matrix.json-    143081,
> > > 11e7f22f5e85058b09ca90e74002a3b82f50e940/matrix.json-    143623
> > > 11e7f22f5e85058b09ca90e74002a3b82f50e940/matrix.json-  ],
> > > 
> > > 
> > > 280b792cac62ddadca2935766ca870b438c86323/matrix.json:  "will-it-scale.per_process_ops": [
> > > 280b792cac62ddadca2935766ca870b438c86323/matrix.json-    134451,
> > > 280b792cac62ddadca2935766ca870b438c86323/matrix.json-    135089,
> > > 280b792cac62ddadca2935766ca870b438c86323/matrix.json-    135080,
> > > 280b792cac62ddadca2935766ca870b438c86323/matrix.json-    135039,
> > > 280b792cac62ddadca2935766ca870b438c86323/matrix.json-    134082,
> > > 280b792cac62ddadca2935766ca870b438c86323/matrix.json-    135301
> > > 280b792cac62ddadca2935766ca870b438c86323/matrix.json-  ],
> > > 
> > > 
> > > > 
> > > > My testing didn't produce anything like this.  I'll have a look into
> > > > this when I can, but there isn't anything obvious that sticks out as a
> > > > likely cause.
> > > 
> > > if you want us to test any debug patch, it will be our great pleasure. thanks!
> > 
> > It looks like the result of the shape of the tree changing.  Can you try
> > the attached patch against Linus' tree?
> 
> I appled your patch upon below mainline tip commit when I checked.
> 6779b50faa562 ("Merge tag 'pci-v7.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci")
> 
> then build kernels with attached config.
> 
> but found a big regression introduced by your patch.
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-ivb-2ep2/mmap2/will-it-scale
> 
> commit:
>   6779b50faa562 ("Merge tag 'pci-v7.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci")
>   942596e1c1037 ("maple_tree: Restore old tree layout using new scatter-gather node copy")
> 
> 6779b50faa562e6c 942596e1c1037f014638541bce4
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    7023559           -42.9%    4007886        will-it-scale.48.processes
>     146323           -42.9%      83497        will-it-scale.per_process_ops
>    7023559           -42.9%    4007886        will-it-scale.workload


How many runs did you do?  The same 6 runs with 48 processes on a 48
thread machine?  My results vary from this dramatically (+13% increase,
minus the 6% regression so 7% gain over what existed before).

> 
> 
> full comparison is as below [1]
> 
> however, it seems the performance regression is really recovered at commit
> 6779b50faa562, though the configs are not same.

I do not understand your statement here.

Are you saying that the current code isn't producing the regression
after 6779b50faa562?  Because 6779b50faa562 has nothing to do with my
code, so I don't understand why that affects the results at all.

> list the regression we reported
> for refererence.

Are you listing the regression in the applied patch for reference, or
was the initial report for reference?

This also means that I don't need to work on this now, right?  Because
there is no regression?  I'd like to clarify that before I spend more
time trying to fix this, especially since I saw nether of these
regressions in my testing - that is, I cannot reproduce your findings in
either code base.

> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-ivb-2ep2/mmap2/will-it-scale
> 
> commit: 
>   11e7f22f5e ("maple_tree: add cp_converged() helper")
>   280b792cac ("maple_tree: use maple copy node for mas_wr_split()")
> 
> 11e7f22f5e85058b 280b792cac62ddadca2935766ca 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>    6885401            -6.0%    6472359        will-it-scale.48.processes
>     143445            -6.0%     134840        will-it-scale.per_process_ops
>    6885401            -6.0%    6472359        will-it-scale.workload
> 
> 
> 
> [1]
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-ivb-2ep2/mmap2/will-it-scale
> 
> commit:
>   6779b50faa562 ("Merge tag 'pci-v7.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci")
>   942596e1c1037 ("maple_tree: Restore old tree layout using new scatter-gather node copy")
> 
> 6779b50faa562e6c 942596e1c1037f014638541bce4
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    7023559           -42.9%    4007886        will-it-scale.48.processes
>     146323           -42.9%      83497        will-it-scale.per_process_ops
>    7023559           -42.9%    4007886        will-it-scale.workload
>       0.66           -12.7%       0.58        turbostat.IPC
>      18.34            +5.3%      19.31        turbostat.RAMWatt
>       0.42            +0.0        0.45        mpstat.cpu.all.irq%
>       2.66            +3.3        5.93        mpstat.cpu.all.soft%
>      14.68            -5.9        8.81        mpstat.cpu.all.usr%
>      14.62           -40.2%       8.75        vmstat.cpu.us
>    6222969           -10.4%    5575283        vmstat.memory.cache
>       7082           +38.9%       9835        vmstat.system.cs
>    2474044 ±  6%    +298.0%    9845581 ±  6%  numa-numastat.node0.local_node
>    2496926 ±  6%    +295.0%    9862610 ±  6%  numa-numastat.node0.numa_hit
>    2770972 ±  3%    +255.1%    9840248 ±  5%  numa-numastat.node1.local_node
>    2797815 ±  3%    +252.9%    9872775 ±  5%  numa-numastat.node1.numa_hit
>     224.50 ±  7%     +79.4%     402.75 ±  2%  perf-c2c.DRAM.local
>     130.88 ± 10%    +104.5%     267.62 ±  5%  perf-c2c.DRAM.remote
>     477.25 ±  8%    +306.6%       1940 ±  5%  perf-c2c.HITM.local
>      86.75 ± 17%    +190.5%     252.00 ±  5%  perf-c2c.HITM.remote
>       7083           +39.2%       9857        perf-stat.i.context-switches
>     243.71           -17.4%     201.31        perf-stat.i.cpu-migrations
>       7059           +39.2%       9824        perf-stat.ps.context-switches
>     242.83           -17.4%     200.61        perf-stat.ps.cpu-migrations
>    2432336 ±  3%     -30.0%    1703846 ± 12%  numa-meminfo.node1.Active
>    2432227 ±  3%     -30.0%    1703734 ± 12%  numa-meminfo.node1.Active(anon)
>     456558 ±  7%     -60.7%     179634 ± 20%  numa-meminfo.node1.Mapped
>       5874 ±  5%     -10.6%       5254 ±  6%  numa-meminfo.node1.PageTables
>     111108 ±  6%      +9.8%     121989 ±  6%  numa-meminfo.node1.SUnreclaim
>    2014977 ±  2%     -34.3%    1324406 ± 12%  numa-meminfo.node1.Shmem
>       0.26 ±  4%     -16.5%       0.22 ±  2%  perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>       0.26 ±  4%     -16.5%       0.22 ±  2%  perf-sched.total_sch_delay.average.ms
>      33.78 ±  5%     -41.3%      19.83 ±  2%  perf-sched.total_wait_and_delay.average.ms
>      25847 ±  5%     +73.3%      44801 ±  2%  perf-sched.total_wait_and_delay.count.ms
>      33.51 ±  5%     -41.5%      19.61 ±  2%  perf-sched.total_wait_time.average.ms
>      33.78 ±  5%     -41.3%      19.83 ±  2%  perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>      25847 ±  5%     +73.3%      44801 ±  2%  perf-sched.wait_and_delay.count.[unknown].[unknown].[unknown].[unknown].[unknown]
>      33.51 ±  5%     -41.5%      19.61 ±  2%  perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
>    2718564           -24.0%    2067012        meminfo.Active
>    2718343           -24.0%    2066792        meminfo.Active(anon)
>    6132342           -10.5%    5486698        meminfo.Cached
>     504163 ±  2%     -55.1%     226137 ±  2%  meminfo.Mapped
>    7551345            -8.6%    6901581        meminfo.Memused
>      11026            -5.9%      10375        meminfo.PageTables
>     232673           +11.1%     258411        meminfo.SUnreclaim
>    2036572           -31.7%    1390925        meminfo.Shmem
>     321293            +7.5%     345548        meminfo.Slab
>    7698968            -8.1%    7076107        meminfo.max_used_kB
>    2496631 ±  6%    +295.0%    9862492 ±  6%  numa-vmstat.node0.numa_hit
>    2473748 ±  6%    +298.0%    9845463 ±  6%  numa-vmstat.node0.numa_local
>     608040 ±  3%     -30.0%     425868 ± 12%  numa-vmstat.node1.nr_active_anon
>     113181 ±  7%     -60.5%      44658 ± 20%  numa-vmstat.node1.nr_mapped
>       1466 ±  5%     -10.5%       1313 ±  6%  numa-vmstat.node1.nr_page_table_pages
>     503740 ±  2%     -34.3%     331036 ± 12%  numa-vmstat.node1.nr_shmem
>      27840 ±  6%      +9.9%      30585 ±  6%  numa-vmstat.node1.nr_slab_unreclaimable
>     608040 ±  3%     -30.0%     425868 ± 12%  numa-vmstat.node1.nr_zone_active_anon
>    2797544 ±  3%    +252.9%    9872464 ±  5%  numa-vmstat.node1.numa_hit
>    2770701 ±  3%    +255.1%    9839937 ±  5%  numa-vmstat.node1.numa_local
>      98407 ±  8%     -16.6%      82034 ± 12%  sched_debug.cfs_rq:/.avg_vruntime.stddev
>     584720 ± 55%     +82.9%    1069238 ± 20%  sched_debug.cfs_rq:/.left_deadline.stddev
>     584716 ± 55%     +82.9%    1069230 ± 20%  sched_debug.cfs_rq:/.left_vruntime.stddev
>     584717 ± 55%     +82.9%    1069231 ± 20%  sched_debug.cfs_rq:/.right_vruntime.stddev
>      98406 ±  8%     -16.6%      82034 ± 12%  sched_debug.cfs_rq:/.zero_vruntime.stddev
>       3696 ± 22%     -53.0%       1736 ± 28%  sched_debug.cpu.curr->pid.min
>       1080 ±  8%     +22.1%       1320 ±  7%  sched_debug.cpu.curr->pid.stddev
>      24404           +36.7%      33355        sched_debug.cpu.nr_switches.avg
>      35594 ±  5%     +26.6%      45073 ±  6%  sched_debug.cpu.nr_switches.max
>      19226           +47.9%      28429        sched_debug.cpu.nr_switches.min
>     679722           -24.0%     516709        proc-vmstat.nr_active_anon
>    1446664            +1.1%    1462897        proc-vmstat.nr_dirty_background_threshold
>    2896867            +1.1%    2929372        proc-vmstat.nr_dirty_threshold
>    1533216           -10.5%    1371700        proc-vmstat.nr_file_pages
>   14575614            +1.1%   14738179        proc-vmstat.nr_free_pages
>     126756 ±  2%     -55.4%      56505 ±  2%  proc-vmstat.nr_mapped
>       2757            -6.0%       2592        proc-vmstat.nr_page_table_pages
>     509273           -31.7%     347755        proc-vmstat.nr_shmem
>      22155            -1.7%      21783        proc-vmstat.nr_slab_reclaimable
>      58291           +10.7%      64524        proc-vmstat.nr_slab_unreclaimable
>     679722           -24.0%     516709        proc-vmstat.nr_zone_active_anon
>    5296571          +272.6%   19736811        proc-vmstat.numa_hit
>    5246846          +275.2%   19687252        proc-vmstat.numa_local
>    9465911          +307.2%   38547209        proc-vmstat.pgalloc_normal
>    8803386 ±  2%    +332.6%   38087100        proc-vmstat.pgfree
>      19.27            -5.7       13.54        perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
>      15.16            -5.1       10.07        perf-profile.calltrace.cycles-pp.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>      54.50            -4.2       50.33        perf-profile.calltrace.cycles-pp.__mmap
>       8.54            -2.9        5.66        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
>       8.04            -2.8        5.22        perf-profile.calltrace.cycles-pp.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap
>       9.39            -2.6        6.83        perf-profile.calltrace.cycles-pp.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
>       6.62            -2.5        4.12        perf-profile.calltrace.cycles-pp.zap_pmd_range.__zap_vma_range.unmap_vmas.unmap_region.vms_complete_munmap_vmas
>       8.69            -2.3        6.34        perf-profile.calltrace.cycles-pp.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap.vm_mmap_pgoff
>       8.36            -2.3        6.10        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region.do_mmap
>       5.66            -1.9        3.72        perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
>       3.90            -1.7        2.20        perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.__mmap
>       3.76            -1.6        2.13        perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.__munmap
>       2.99 ±  3%      -1.6        1.39        perf-profile.calltrace.cycles-pp.mas_preallocate.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
>       3.68            -1.3        2.39        perf-profile.calltrace.cycles-pp.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap
>       3.38            -1.2        2.18        perf-profile.calltrace.cycles-pp.free_p4d_range.free_pgd_range.free_pgtables.unmap_region.vms_complete_munmap_vmas
>       2.86            -1.0        1.81        perf-profile.calltrace.cycles-pp.free_pud_range.free_p4d_range.free_pgd_range.free_pgtables.unmap_region
>       6.25            -1.0        5.24        perf-profile.calltrace.cycles-pp.__get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>       5.95            -1.0        4.95        perf-profile.calltrace.cycles-pp.shmem_get_unmapped_area.__get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
>       2.77 ±  2%      -0.9        1.85 ±  2%  perf-profile.calltrace.cycles-pp.d_path.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region
>       2.41            -0.8        1.65        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.__zap_vma_range.unmap_vmas.unmap_region
>       1.53            -0.7        0.85        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__munmap
>       1.50            -0.6        0.85        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__mmap
>       4.86            -0.6        4.23        perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.shmem_get_unmapped_area.__get_unmapped_area.do_mmap.vm_mmap_pgoff
>       1.67 ±  4%      -0.6        1.05 ±  4%  perf-profile.calltrace.cycles-pp.prepend_path.d_path.perf_event_mmap_event.perf_event_mmap.__mmap_complete
>       1.37            -0.6        0.75        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__munmap
>       1.36            -0.6        0.76        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__mmap
>       1.78            -0.6        1.22        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region
>       1.69            -0.6        1.13        perf-profile.calltrace.cycles-pp.shmem_mmap_prepare.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
>       4.11            -0.5        3.60        perf-profile.calltrace.cycles-pp.vm_unmapped_area.arch_get_unmapped_area_topdown.shmem_get_unmapped_area.__get_unmapped_area.do_mmap
>       1.23            -0.5        0.72        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
>       1.38            -0.5        0.89        perf-profile.calltrace.cycles-pp.touch_atime.shmem_mmap_prepare.__mmap_region.do_mmap.vm_mmap_pgoff
>       1.37            -0.5        0.89        perf-profile.calltrace.cycles-pp.mas_find.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
>       3.96            -0.5        3.49        perf-profile.calltrace.cycles-pp.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.shmem_get_unmapped_area.__get_unmapped_area
>       1.46            -0.5        1.01        perf-profile.calltrace.cycles-pp.mas_find.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
>       1.10            -0.4        0.67        perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
>       1.11            -0.4        0.73        perf-profile.calltrace.cycles-pp.unlink_file_vma_batch_process.free_pgtables.unmap_region.vms_complete_munmap_vmas.do_vmi_align_munmap
>       0.97            -0.4        0.60        perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.shmem_mmap_prepare.__mmap_region.do_mmap
>       0.95            -0.4        0.58        perf-profile.calltrace.cycles-pp.vma_merge_new_range.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
>       2.41            -0.3        2.07        perf-profile.calltrace.cycles-pp.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
>       1.71            -0.3        1.40        perf-profile.calltrace.cycles-pp.__build_id_parse.perf_event_mmap_event.perf_event_mmap.__mmap_complete.__mmap_region
>       0.80            -0.3        0.55        perf-profile.calltrace.cycles-pp.pte_offset_map_lock.zap_pte_range.zap_pmd_range.__zap_vma_range.unmap_vmas
>       1.20            -0.2        1.00        perf-profile.calltrace.cycles-pp.mas_store_gfp.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       0.81            -0.2        0.63        perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.__mmap_new_vma.__mmap_region.do_mmap
>       1.97            -0.2        1.79        perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region.do_mmap
>       0.76            -0.2        0.60        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.__mmap_region.do_mmap.vm_mmap_pgoff
>       1.15            -0.1        1.01        perf-profile.calltrace.cycles-pp.freader_fetch.__build_id_parse.perf_event_mmap_event.perf_event_mmap.__mmap_complete
>       1.48            -0.1        1.36        perf-profile.calltrace.cycles-pp.mas_rev_awalk.mas_empty_area_rev.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown
>       0.90            -0.1        0.78        perf-profile.calltrace.cycles-pp.freader_get_folio.freader_fetch.__build_id_parse.perf_event_mmap_event.perf_event_mmap
>       0.77            -0.1        0.66        perf-profile.calltrace.cycles-pp.khugepaged_enter_vma.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
>       1.93 ±  2%      -0.1        1.82        perf-profile.calltrace.cycles-pp.kmem_cache_free.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       2.20            -0.1        2.11        perf-profile.calltrace.cycles-pp.mas_empty_area_rev.unmapped_area_topdown.vm_unmapped_area.arch_get_unmapped_area_topdown.shmem_get_unmapped_area
>       0.66            -0.1        0.58 ±  2%  perf-profile.calltrace.cycles-pp.__filemap_get_folio_mpol.freader_get_folio.freader_fetch.__build_id_parse.perf_event_mmap_event
>       1.17 ±  3%      -0.1        1.10        perf-profile.calltrace.cycles-pp.perf_session__process_events.record__finish_output.cmd_record
>       1.17 ±  3%      -0.1        1.10        perf-profile.calltrace.cycles-pp.cmd_record
>       1.17 ±  3%      -0.1        1.10        perf-profile.calltrace.cycles-pp.record__finish_output.cmd_record
>       5.13            +0.1        5.20        perf-profile.calltrace.cycles-pp.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
>      44.84            +0.2       45.05        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       0.70            +0.4        1.07        perf-profile.calltrace.cycles-pp.mas_prev_slot.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       0.20 ±129%      +0.4        0.60 ±  2%  perf-profile.calltrace.cycles-pp.ordered_events__deliver_event.__ordered_events__flush.perf_session__process_user_event.perf_session__process_events.record__finish_output
>       0.22 ±129%      +0.4        0.64 ±  2%  perf-profile.calltrace.cycles-pp.__ordered_events__flush.perf_session__process_user_event.perf_session__process_events.record__finish_output.cmd_record
>       0.22 ±129%      +0.4        0.64 ±  2%  perf-profile.calltrace.cycles-pp.perf_session__process_user_event.perf_session__process_events.record__finish_output.cmd_record
>       0.13 ±173%      +0.5        0.59 ±  2%  perf-profile.calltrace.cycles-pp.perf_session__deliver_event.ordered_events__deliver_event.__ordered_events__flush.perf_session__process_user_event.perf_session__process_events
>      43.80            +0.5       44.30        perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       0.00            +0.5        0.52 ±  2%  perf-profile.calltrace.cycles-pp.node_finalise.mas_wr_spanning_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
>       0.00            +0.6        0.56        perf-profile.calltrace.cycles-pp.memcpy_orig.node_copy.cp_data_write.mas_wr_spanning_store.mas_store_gfp
>       0.00            +0.6        0.58        perf-profile.calltrace.cycles-pp.memcpy_orig.node_copy.cp_data_write.mas_wr_split.mas_store_prealloc
>       0.89            +0.6        1.47        perf-profile.calltrace.cycles-pp.mas_find.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       0.00            +0.7        0.66        perf-profile.calltrace.cycles-pp.mas_next_node.mas_next_slot.mas_find.vms_gather_munmap_vmas.do_vmi_align_munmap
>       0.65            +0.7        1.31        perf-profile.calltrace.cycles-pp.mas_next_slot.mas_find.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
>       0.00            +0.7        0.67 ±  2%  perf-profile.calltrace.cycles-pp.__kfree_rcu_sheaf.kvfree_call_rcu.mas_topiary_replace.mas_wr_spanning_store.mas_store_gfp
>       0.00            +0.8        0.78        perf-profile.calltrace.cycles-pp.node_finalise.mas_wr_split.mas_store_prealloc.__mmap_new_vma.__mmap_region
>       0.00            +0.8        0.82 ±  2%  perf-profile.calltrace.cycles-pp.__slab_free.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core
>      42.40            +1.0       43.37        perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
>       0.00            +1.0        1.00 ±  2%  perf-profile.calltrace.cycles-pp.kvfree_call_rcu.mas_topiary_replace.mas_wr_split.mas_store_prealloc.__mmap_new_vma
>       0.00            +1.3        1.25        perf-profile.calltrace.cycles-pp.kvfree_call_rcu.mas_topiary_replace.mas_wr_spanning_store.mas_store_gfp.do_vmi_align_munmap
>      41.16            +1.3       42.42        perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +1.3        1.26 ±  2%  perf-profile.calltrace.cycles-pp.__kmem_cache_free_bulk.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs
>       0.00            +1.3        1.31        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__prefill_sheaf_pfmemalloc
>       0.00            +1.4        1.40        perf-profile.calltrace.cycles-pp.rcu_free_sheaf.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd
>       0.00            +1.4        1.44        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__refill_objects_node.refill_objects.__prefill_sheaf_pfmemalloc.kmem_cache_prefill_sheaf
>       0.00            +1.5        1.45 ±  2%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn
>       0.00            +1.5        1.46 ±  2%  perf-profile.calltrace.cycles-pp.rcu_core.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread
>       0.00            +1.5        1.47        perf-profile.calltrace.cycles-pp.handle_softirqs.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
>       0.00            +1.5        1.47        perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      45.00            +1.5       46.50        perf-profile.calltrace.cycles-pp.__munmap
>       0.00            +1.5        1.50 ±  2%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       0.00            +1.7        1.73        perf-profile.calltrace.cycles-pp.node_copy.cp_data_write.mas_wr_spanning_store.mas_store_gfp.do_vmi_align_munmap
>       0.00            +1.7        1.73        perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
>       0.00            +1.7        1.73        perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>       0.00            +1.7        1.73        perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>       0.00            +1.8        1.82        perf-profile.calltrace.cycles-pp.node_copy.cp_data_write.mas_wr_split.mas_store_prealloc.__mmap_new_vma
>       0.00            +2.0        1.95        perf-profile.calltrace.cycles-pp.dst_setup.mas_wr_spanning_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
>      33.18            +2.6       35.81        perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
>       0.00            +2.9        2.88        perf-profile.calltrace.cycles-pp.cp_data_write.mas_wr_spanning_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
>       0.00            +3.0        3.03        perf-profile.calltrace.cycles-pp.dst_setup.mas_wr_split.mas_store_prealloc.__mmap_new_vma.__mmap_region
>       0.00            +3.3        3.31        perf-profile.calltrace.cycles-pp.mas_topiary_replace.mas_wr_split.mas_store_prealloc.__mmap_new_vma.__mmap_region
>       0.00            +3.6        3.59        perf-profile.calltrace.cycles-pp.mas_topiary_replace.mas_wr_spanning_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
>       0.00            +3.9        3.87        perf-profile.calltrace.cycles-pp.__refill_objects_node.refill_objects.__prefill_sheaf_pfmemalloc.kmem_cache_prefill_sheaf.mas_store_gfp
>       0.00            +3.9        3.91        perf-profile.calltrace.cycles-pp.cp_data_write.mas_wr_split.mas_store_prealloc.__mmap_new_vma.__mmap_region
>       0.00            +4.6        4.64        perf-profile.calltrace.cycles-pp.refill_objects.__prefill_sheaf_pfmemalloc.kmem_cache_prefill_sheaf.mas_store_gfp.do_vmi_align_munmap
>       0.00            +4.7        4.71        perf-profile.calltrace.cycles-pp.__prefill_sheaf_pfmemalloc.kmem_cache_prefill_sheaf.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap
>       0.00            +5.1        5.10        perf-profile.calltrace.cycles-pp.kmem_cache_prefill_sheaf.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>      36.19            +5.6       41.79        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
>      36.00            +5.7       41.69        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      35.15            +6.0       41.13        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      35.01            +6.0       41.03        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      34.08            +6.3       40.39        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      32.43            +6.8       39.26        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
>      12.68            +8.9       21.54        perf-profile.calltrace.cycles-pp.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
>       4.77           +11.2       16.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap.vm_mmap_pgoff
>       7.25           +12.6       19.89        perf-profile.calltrace.cycles-pp.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
>       0.00           +13.7       13.69        perf-profile.calltrace.cycles-pp.mas_wr_spanning_store.mas_store_gfp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       0.00           +14.8       14.81        perf-profile.calltrace.cycles-pp.mas_wr_split.mas_store_prealloc.__mmap_new_vma.__mmap_region.do_mmap
>       8.17            -8.2        0.00        perf-profile.children.cycles-pp.mas_wr_node_store

It looks like the slow path is taken 100% of the time in your run, for
all processes, always.

I understand why that is happening and I can work to avoid it (nudging
the write deeper into a split node), but I'd like clarification of my
questions above before I keep going here.

...

Thanks,
Liam



More information about the maple-tree mailing list