[PATCH 00/94] Introducing the Maple Tree

Liam Howlett liam.howlett at oracle.com
Wed Apr 28 17:17:43 BST 2021


* Michel Lespinasse <walken.cr at gmail.com> [210428 11:49]:
> Hey Liam,
> 
> Thanks for copying me on this, and I will try to send some review
> comments before the end  of the week (I assume this is very similar to
> the linux-uek/howlett/maple/20210419 tag I was looking at recently ?)

Hi Michel,

Yes, I actually deleted and re-made that tag with one updated comment
and that should be it.

> 
> Just a quick note, I would prefer if you used my michel at lespinasse.org
> email for code postings (it goes to my local account, which is better
> suited for code reviews than gmail).

I will be sure to do that in the future.  I wasn't sure what account to
use and I noticed you had changed emails recently so I tried to update
the CC list.

Cheers,
Liam


> 
> On Wed, Apr 28, 2021 at 8:35 AM Liam Howlett <liam.howlett at oracle.com> wrote:
> >
> > The maple tree is an RCU-safe range based B-tree designed to use modern
> > processor cache efficiently.  There are a number of places in the kernel
> > that a non-overlapping range-based tree would be beneficial, especially
> > one with a simple interface.  The first user that is covered in this
> > patch set is the vm_area_struct, where three data structures are
> > replaced by the maple tree: the augmented rbtree, the vma cache, and the
> > linked list of VMAs in the mm_struct.  The long term goal is to reduce
> > or remove the mmap_sem contention.
> >
> > The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
> > nodes.  With the increased branching factor, it is significantly shorter than
> > the rbtree so it has fewer cache misses.  The removal of the linked list
> > between subsequent entries also reduces the cache misses and the need to pull
> > in the previous and next VMA during many tree alterations.
> >
> > This patch is based on next-20210419
> >
> > Link:
> > https://github.com/oracle/linux-uek/releases/tag/howlett%2Fmaple%2F20210419
> >
> > Performance on a 144 core x86:
> >
> > While still using the mmap_sem, the performance seems fairly similar on
> > real-world workloads, while there are variations in micro-benchmarks.
> >
> > Increase in performance in the following micro-benchmarks in Hmean:
> > - wis brk1-threads: Disregard, this is useless.
> > - wis malloc1-threads: Increase of 15% to 500%
> > - wis page_fault1-threads: Increase of up to 15%
> >
> > Decrease in performance in the following micro-benchmarks in Hmean:
> > - wis brk1-processes: Decrease of 45% due to RCU required allocations
> > - wis signal1-threads: -3% to -20%
> >
> > Mixed:
> > - wis malloc1-processes: +9% to -20%
> > - wis pthread_mutex1-threads: +9% to -16%
> > - wis page_fault3-threads: +7% to -21%
> >
> >
> > kernbench:
> > Amean     user-2        880.66 (   0.00%)      885.49 *  -0.55%*
> > Amean     syst-2        143.88 (   0.00%)      152.64 *  -6.09%*
> > Amean     elsp-2        518.11 (   0.00%)      524.97 *  -1.32%*
> > Amean     user-4        903.74 (   0.00%)      908.41 *  -0.52%*
> > Amean     syst-4        149.61 (   0.00%)      158.42 *  -5.89%*
> > Amean     elsp-4        270.00 (   0.00%)      275.83 *  -2.16%*
> > Amean     user-8        955.25 (   0.00%)      961.88 *  -0.69%*
> > Amean     syst-8        158.76 (   0.00%)      169.04 *  -6.48%*
> > Amean     elsp-8        146.95 (   0.00%)      148.79 *  -1.25%*
> > Amean     user-16      1033.15 (   0.00%)     1037.42 *  -0.41%*
> > Amean     syst-16       168.01 (   0.00%)      177.40 *  -5.59%*
> > Amean     elsp-16        84.63 (   0.00%)       84.91 *  -0.33%*
> > Amean     user-32      1135.48 (   0.00%)     1130.56 *   0.43%*
> > Amean     syst-32       181.35 (   0.00%)      190.54 *  -5.06%*
> > Amean     elsp-32        50.53 (   0.00%)       51.88 *  -2.67%*
> > Amean     user-64      1191.24 (   0.00%)     1205.41 *  -1.19%*
> > Amean     syst-64       189.13 (   0.00%)      200.28 *  -5.89%*
> > Amean     elsp-64        31.75 (   0.00%)       31.95 *  -0.63%*
> > Amean     user-128     1608.94 (   0.00%)     1613.01 *  -0.25%*
> > Amean     syst-128      239.32 (   0.00%)      253.50 *  -5.93%*
> > Amean     elsp-128       25.34 (   0.00%)       26.02 *  -2.69%*
> >
> > gitcheckout:
> > Amean     User           0.00 (   0.00%)        0.00 *   0.00%*
> > Amean     System         8.33 (   0.00%)        7.86 *   5.59%*
> > Amean     Elapsed       21.85 (   0.00%)       21.34 *   2.29%*
> > Amean     CPU           98.80 (   0.00%)       99.00 *  -0.20%*
> >
> >
> > Patch organization:
> > Patches 1-20 introduce a new VMA API: lookup_vma().  This looks up the VMA at a
> > given address and returns either NULL or the VMA.  In looking at the VMA code,
> > find_vma() is often used to look up a vma and then check the limits (or
> > incorrectly assume that find_vma() does not continue searching upwards).
> > Initially, this is just a wrapper for find_vma_intersection() but becomes more
> > efficient once the maple tree is used.
> >
> > Patches 21-25 update the radix tree test suite to support what is necessary for
> > the maple tree.
> >
> > Patch 26 is the maple tree and test code: 36,152 lines are the
> > test_maple_tree.c file alone.
> >
> > Patches 27-33 are the removal of the rbtree from the mm_struct.
> >
> > Patches 34-44 are the removal of the vmacache from the kernel.
> >
> > Patches 45-51 are optimizations for using the maple tree directly.
> >
> > Patches 52-91 are the removal of the linked list from the mm_struct.
> >
> > Patches 92-94 are to do with cleaning up the locking around the splitting of
> > VMAs using the maple tree state.
> >
> >
> > Liam R. Howlett (94):
> >   mm: Add vma_lookup()
> >   drm/i915/selftests: Use vma_lookup() in __igt_mmap()
> >   arch/arc/kernel/troubleshoot: use vma_lookup() instead of
> >     find_vma_intersection()
> >   arch/arm64/kvm: Use vma_lookup() instead of find_vma_intersection()
> >   arch/powerpc/kvm/book3s_hv_uvmem: Use vma_lookup() instead of
> >     find_vma_intersection()
> >   arch/powerpc/kvm/book3s: Use vma_lookup() in
> >     kvmppc_hv_setup_htab_rma()
> >   arch/mips/kernel/traps: Use vma_lookup() instead of
> >     find_vma_intersection()
> >   x86/sgx: Use vma_lookup() in sgx_encl_find()
> >   virt/kvm: Use vma_lookup() instead of find_vma_intersection()
> >   vfio: Use vma_lookup() instead of find_vma_intersection()
> >   net/ipv5/tcp: Use vma_lookup() in tcp_zerocopy_receive()
> >   drm/amdgpu: Use vma_lookup() in amdgpu_ttm_tt_get_user_pages()
> >   media: videobuf2: Use vma_lookup() in get_vaddr_frames()
> >   misc/sgi-gru/grufault: Use vma_lookup() in gru_find_vma()
> >   kernel/events/uprobes: Use vma_lookup() in find_active_uprobe()
> >   lib/test_hmm: Use vma_lookup() in dmirror_migrate()
> >   mm/ksm: Use vma_lookup() in find_mergeable_vma()
> >   mm/migrate: Use vma_lookup() in do_pages_stat_array()
> >   mm/mremap: Use vma_lookup() in vma_to_resize()
> >   mm/memory.c: Use vma_lookup() instead of find_vma_intersection()
> >   radix tree test suite: Enhancements for Maple Tree
> >   radix tree test suite: Add support for fallthrough attribute
> >   radix tree test suite: Add support for kmem_cache_free_bulk
> >   radix tree test suite: Add keme_cache_alloc_bulk() support
> >   radix tree test suite: Add __must_be_array() support
> >   Maple Tree: Add new data structure
> >   mm: Start tracking VMAs with maple tree
> >   mm/mmap: Introduce unlock_range() for code cleanup
> >   mm/mmap: Change find_vma() to use the maple tree
> >   mm/mmap: Change find_vma_prev() to use maple tree
> >   mm/mmap: Change unmapped_area and unmapped_area_topdown to use maple
> >     tree
> >   kernel/fork: Convert dup_mmap to use maple tree
> >   mm: Remove rb tree.
> >   arch/m68k/kernel/sys_m68k: Use vma_lookup() in sys_cacheflush()
> >   xen/privcmd: Optimized privcmd_ioctl_mmap() by using vma_lookup()
> >   mm: Optimize find_exact_vma() to use vma_lookup()
> >   mm/khugepaged: Optimize collapse_pte_mapped_thp() by using
> >     vma_lookup()
> >   mm/gup: Add mm_populate_vma() for use when the vma is known
> >   mm/mmap: Change do_brk_flags() to expand existing VMA and add
> >     do_brk_munmap()
> >   mm/mmap: Change vm_brk_flags() to use mm_populate_vma()
> >   mm: Change find_vma_intersection() to maple tree and make find_vma()
> >     to inline.
> >   mm/mmap: Change mmap_region() to use maple tree state
> >   mm/mmap: Drop munmap_vma_range()
> >   mm: Remove vmacache
> >   mm/mmap: Change __do_munmap() to avoid unnecessary lookups.
> >   mm/mmap: Move mmap_region() below do_munmap()
> >   mm/mmap: Add do_mas_munmap() and wraper for __do_munmap()
> >   mmap: Use find_vma_intersection in do_mmap() for overlap
> >   mmap: Remove __do_munmap() in favour of do_mas_munmap()
> >   mm/mmap: Change do_brk_munmap() to use do_mas_align_munmap()
> >   mmap: make remove_vma_list() inline
> >   mm: Introduce vma_next() and vma_prev()
> >   arch/arm64: Remove mmap linked list from vdso.
> >   arch/parisc: Remove mmap linked list from kernel/cache
> >   arch/powerpc: Remove mmap linked list from mm/book3s32/tlb
> >   arch/powerpc: Remove mmap linked list from mm/book3s64/subpage_prot
> >   arch/s390: Use maple tree iterators instead of linked list.
> >   arch/x86: Use maple tree iterators for vdso/vma
> >   arch/xtensa: Use maple tree iterators for unmapped area
> >   drivers/misc/cxl: Use maple tree iterators for cxl_prefault_vma()
> >   drivers/tee/optee: Use maple tree iterators for __check_mem_type()
> >   fs/binfmt_elf: Use maple tree iterators for fill_files_note()
> >   fs/coredump: Use maple tree iterators in place of linked list
> >   fs/exec: Use vma_next() instead of linked list
> >   fs/proc/base: Use maple tree iterators in place of linked list
> >   fs/proc/task_mmu: Stop using linked list and highest_vm_end
> >   fs/userfaultfd: Stop using vma linked list.
> >   ipc/shm: Stop using the vma linked list
> >   kernel/acct: Use maple tree iterators instead of linked list
> >   kernel/events/core: Use maple tree iterators instead of linked list
> >   kernel/events/uprobes: Use maple tree iterators instead of linked list
> >   kernel/sched/fair: Use maple tree iterators instead of linked list
> >   kernel/sys: Use maple tree iterators instead of linked list
> >   mm/gup: Use maple tree navigation instead of linked list
> >   mm/huge_memory: Use vma_next() instead of vma linked list
> >   mm/khugepaged: Use maple tree iterators instead of vma linked list
> >   mm/ksm: Use maple tree iterators instead of vma linked list
> >   mm/madvise: Use vma_next instead of vma linked list
> >   mm/memcontrol: Stop using mm->highest_vm_end
> >   mm/mempolicy: Use maple tree iterators instead of vma linked list
> >   mm/mlock: Use maple tree iterators instead of vma linked list
> >   mm/mprotect: Use maple tree navigation instead of vma linked list
> >   mm/mremap: Use vma_next() instead of vma linked list
> >   mm/msync: Use vma_next() instead of vma linked list
> >   mm/oom_kill: Use maple tree iterators instead of vma linked list
> >   mm/pagewalk: Use vma_next() instead of vma linked list
> >   mm/swapfile: Use maple tree iterator instead of vma linked list
> >   mm/util: Remove __vma_link_list() and __vma_unlink_list()
> >   arch/um/kernel/tlb: Stop using linked list
> >   bpf: Remove VMA linked list
> >   mm: Remove vma linked list.
> >   mm: Return a bool from anon_vma_interval_tree_verify()
> >   mm/mmap: Add mas_split_vma() and use it for munmap()
> >   mm: Move mas locking outside of munmap() path.
> >
> >  Documentation/core-api/index.rst              |     1 +
> >  Documentation/core-api/maple-tree.rst         |    36 +
> >  MAINTAINERS                                   |    12 +
> >  arch/arc/kernel/troubleshoot.c                |     8 +-
> >  arch/arm64/kernel/vdso.c                      |     5 +-
> >  arch/arm64/kvm/mmu.c                          |     2 +-
> >  arch/m68k/kernel/sys_m68k.c                   |     4 +-
> >  arch/mips/kernel/traps.c                      |     4 +-
> >  arch/parisc/kernel/cache.c                    |    15 +-
> >  arch/powerpc/kvm/book3s_hv.c                  |     4 +-
> >  arch/powerpc/kvm/book3s_hv_uvmem.c            |     2 +-
> >  arch/powerpc/mm/book3s32/tlb.c                |     5 +-
> >  arch/powerpc/mm/book3s64/subpage_prot.c       |    15 +-
> >  arch/s390/configs/debug_defconfig             |     1 -
> >  arch/s390/mm/gmap.c                           |     8 +-
> >  arch/um/kernel/tlb.c                          |    16 +-
> >  arch/x86/entry/vdso/vma.c                     |    12 +-
> >  arch/x86/kernel/cpu/sgx/encl.h                |     4 +-
> >  arch/x86/kernel/tboot.c                       |     2 +-
> >  arch/xtensa/kernel/syscall.c                  |     4 +-
> >  drivers/firmware/efi/efi.c                    |     2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |     4 +-
> >  .../drm/i915/gem/selftests/i915_gem_mman.c    |     2 +-
> >  drivers/media/common/videobuf2/frame_vector.c |     2 +-
> >  drivers/misc/cxl/fault.c                      |     6 +-
> >  drivers/misc/sgi-gru/grufault.c               |     4 +-
> >  drivers/tee/optee/call.c                      |    15 +-
> >  drivers/vfio/vfio_iommu_type1.c               |     2 +-
> >  drivers/xen/privcmd.c                         |     2 +-
> >  fs/binfmt_elf.c                               |     5 +-
> >  fs/coredump.c                                 |    13 +-
> >  fs/exec.c                                     |     9 +-
> >  fs/proc/base.c                                |     7 +-
> >  fs/proc/task_mmu.c                            |    45 +-
> >  fs/proc/task_nommu.c                          |    55 +-
> >  fs/userfaultfd.c                              |    43 +-
> >  include/linux/maple_tree.h                    |   449 +
> >  include/linux/mm.h                            |    62 +-
> >  include/linux/mm_types.h                      |    33 +-
> >  include/linux/mm_types_task.h                 |     5 -
> >  include/linux/sched.h                         |     1 -
> >  include/linux/sched/mm.h                      |     3 +
> >  include/linux/vm_event_item.h                 |     4 -
> >  include/linux/vmacache.h                      |    28 -
> >  include/linux/vmstat.h                        |     6 -
> >  include/trace/events/maple_tree.h             |   227 +
> >  include/trace/events/mmap.h                   |    71 +
> >  init/main.c                                   |     2 +
> >  ipc/shm.c                                     |    13 +-
> >  kernel/acct.c                                 |     8 +-
> >  kernel/bpf/task_iter.c                        |     6 +-
> >  kernel/debug/debug_core.c                     |    12 -
> >  kernel/events/core.c                          |     7 +-
> >  kernel/events/uprobes.c                       |    29 +-
> >  kernel/fork.c                                 |    50 +-
> >  kernel/sched/fair.c                           |    14 +-
> >  kernel/sys.c                                  |     6 +-
> >  lib/Kconfig.debug                             |    15 +-
> >  lib/Makefile                                  |     3 +-
> >  lib/maple_tree.c                              |  6393 +++
> >  lib/test_hmm.c                                |     5 +-
> >  lib/test_maple_tree.c                         | 36152 ++++++++++++++++
> >  mm/Makefile                                   |     2 +-
> >  mm/debug.c                                    |    12 +-
> >  mm/gup.c                                      |    27 +-
> >  mm/huge_memory.c                              |     4 +-
> >  mm/init-mm.c                                  |     4 +-
> >  mm/internal.h                                 |    51 +-
> >  mm/interval_tree.c                            |     6 +-
> >  mm/khugepaged.c                               |    13 +-
> >  mm/ksm.c                                      |    32 +-
> >  mm/madvise.c                                  |     2 +-
> >  mm/memcontrol.c                               |     6 +-
> >  mm/memory.c                                   |    45 +-
> >  mm/mempolicy.c                                |    44 +-
> >  mm/migrate.c                                  |     4 +-
> >  mm/mlock.c                                    |    26 +-
> >  mm/mmap.c                                     |  2271 +-
> >  mm/mprotect.c                                 |     7 +-
> >  mm/mremap.c                                   |    17 +-
> >  mm/msync.c                                    |     2 +-
> >  mm/nommu.c                                    |   135 +-
> >  mm/oom_kill.c                                 |     5 +-
> >  mm/pagewalk.c                                 |     2 +-
> >  mm/swapfile.c                                 |     9 +-
> >  mm/util.c                                     |    32 -
> >  mm/vmacache.c                                 |   117 -
> >  mm/vmstat.c                                   |     4 -
> >  net/ipv4/tcp.c                                |     4 +-
> >  tools/testing/radix-tree/.gitignore           |     2 +
> >  tools/testing/radix-tree/Makefile             |    13 +-
> >  tools/testing/radix-tree/generated/autoconf.h |     1 +
> >  tools/testing/radix-tree/linux.c              |    78 +-
> >  tools/testing/radix-tree/linux/kernel.h       |    10 +
> >  tools/testing/radix-tree/linux/maple_tree.h   |     7 +
> >  tools/testing/radix-tree/linux/slab.h         |     2 +
> >  tools/testing/radix-tree/maple.c              |    59 +
> >  tools/testing/radix-tree/test.h               |     1 +
> >  .../radix-tree/trace/events/maple_tree.h      |     8 +
> >  virt/kvm/kvm_main.c                           |     2 +-
> >  100 files changed, 45347 insertions(+), 1699 deletions(-)
> >  create mode 100644 Documentation/core-api/maple-tree.rst
> >  create mode 100644 include/linux/maple_tree.h
> >  delete mode 100644 include/linux/vmacache.h
> >  create mode 100644 include/trace/events/maple_tree.h
> >  create mode 100644 lib/maple_tree.c
> >  create mode 100644 lib/test_maple_tree.c
> >  delete mode 100644 mm/vmacache.c
> >  create mode 100644 tools/testing/radix-tree/linux/maple_tree.h
> >  create mode 100644 tools/testing/radix-tree/maple.c
> >  create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h
> >
> > --
> > 2.30.2


More information about the maple-tree mailing list