[PATCH 00/94] Introducing the Maple Tree
Liam Howlett
liam.howlett at oracle.com
Wed Apr 28 17:17:43 BST 2021
* Michel Lespinasse <walken.cr at gmail.com> [210428 11:49]:
> Hey Liam,
>
> Thanks for copying me on this, and I will try to send some review
> comments before the end of the week (I assume this is very similar to
> the linux-uek/howlett/maple/20210419 tag I was looking at recently ?)
Hi Michel,
Yes, I actually deleted and re-made that tag with one updated comment
and that should be it.
>
> Just a quick note, I would prefer if you used my michel at lespinasse.org
> email for code postings (it goes to my local account, which is better
> suited for code reviews than gmail).
I will be sure to do that in the future. I wasn't sure what account to
use and I noticed you had changed emails recently so I tried to update
the CC list.
Cheers,
Liam
>
> On Wed, Apr 28, 2021 at 8:35 AM Liam Howlett <liam.howlett at oracle.com> wrote:
> >
> > The maple tree is an RCU-safe range based B-tree designed to use modern
> > processor cache efficiently. There are a number of places in the kernel
> > that a non-overlapping range-based tree would be beneficial, especially
> > one with a simple interface. The first user that is covered in this
> > patch set is the vm_area_struct, where three data structures are
> > replaced by the maple tree: the augmented rbtree, the vma cache, and the
> > linked list of VMAs in the mm_struct. The long term goal is to reduce
> > or remove the mmap_sem contention.
> >
> > The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
> > nodes. With the increased branching factor, it is significantly shorter than
> > the rbtree so it has fewer cache misses. The removal of the linked list
> > between subsequent entries also reduces the cache misses and the need to pull
> > in the previous and next VMA during many tree alterations.
> >
> > This patch is based on next-20210419
> >
> > Link:
> > https://github.com/oracle/linux-uek/releases/tag/howlett%2Fmaple%2F20210419
> >
> > Performance on a 144 core x86:
> >
> > While still using the mmap_sem, the performance seems fairly similar on
> > real-world workloads, while there are variations in micro-benchmarks.
> >
> > Increase in performance in the following micro-benchmarks in Hmean:
> > - wis brk1-threads: Disregard, this is useless.
> > - wis malloc1-threads: Increase of 15% to 500%
> > - wis page_fault1-threads: Increase of up to 15%
> >
> > Decrease in performance in the following micro-benchmarks in Hmean:
> > - wis brk1-processes: Decrease of 45% due to RCU required allocations
> > - wis signal1-threads: -3% to -20%
> >
> > Mixed:
> > - wis malloc1-processes: +9% to -20%
> > - wis pthread_mutex1-threads: +9% to -16%
> > - wis page_fault3-threads: +7% to -21%
> >
> >
> > kernbench:
> > Amean user-2 880.66 ( 0.00%) 885.49 * -0.55%*
> > Amean syst-2 143.88 ( 0.00%) 152.64 * -6.09%*
> > Amean elsp-2 518.11 ( 0.00%) 524.97 * -1.32%*
> > Amean user-4 903.74 ( 0.00%) 908.41 * -0.52%*
> > Amean syst-4 149.61 ( 0.00%) 158.42 * -5.89%*
> > Amean elsp-4 270.00 ( 0.00%) 275.83 * -2.16%*
> > Amean user-8 955.25 ( 0.00%) 961.88 * -0.69%*
> > Amean syst-8 158.76 ( 0.00%) 169.04 * -6.48%*
> > Amean elsp-8 146.95 ( 0.00%) 148.79 * -1.25%*
> > Amean user-16 1033.15 ( 0.00%) 1037.42 * -0.41%*
> > Amean syst-16 168.01 ( 0.00%) 177.40 * -5.59%*
> > Amean elsp-16 84.63 ( 0.00%) 84.91 * -0.33%*
> > Amean user-32 1135.48 ( 0.00%) 1130.56 * 0.43%*
> > Amean syst-32 181.35 ( 0.00%) 190.54 * -5.06%*
> > Amean elsp-32 50.53 ( 0.00%) 51.88 * -2.67%*
> > Amean user-64 1191.24 ( 0.00%) 1205.41 * -1.19%*
> > Amean syst-64 189.13 ( 0.00%) 200.28 * -5.89%*
> > Amean elsp-64 31.75 ( 0.00%) 31.95 * -0.63%*
> > Amean user-128 1608.94 ( 0.00%) 1613.01 * -0.25%*
> > Amean syst-128 239.32 ( 0.00%) 253.50 * -5.93%*
> > Amean elsp-128 25.34 ( 0.00%) 26.02 * -2.69%*
> >
> > gitcheckout:
> > Amean User 0.00 ( 0.00%) 0.00 * 0.00%*
> > Amean System 8.33 ( 0.00%) 7.86 * 5.59%*
> > Amean Elapsed 21.85 ( 0.00%) 21.34 * 2.29%*
> > Amean CPU 98.80 ( 0.00%) 99.00 * -0.20%*
> >
> >
> > Patch organization:
> > Patches 1-20 introduce a new VMA API: lookup_vma(). This looks up the VMA at a
> > given address and returns either NULL or the VMA. In looking at the VMA code,
> > find_vma() is often used to look up a vma and then check the limits (or
> > incorrectly assume that find_vma() does not continue searching upwards).
> > Initially, this is just a wrapper for find_vma_intersection() but becomes more
> > efficient once the maple tree is used.
> >
> > Patches 21-25 update the radix tree test suite to support what is necessary for
> > the maple tree.
> >
> > Patch 26 is the maple tree and test code: 36,152 lines are the
> > test_maple_tree.c file alone.
> >
> > Patches 27-33 are the removal of the rbtree from the mm_struct.
> >
> > Patches 34-44 are the removal of the vmacache from the kernel.
> >
> > Patches 45-51 are optimizations for using the maple tree directly.
> >
> > Patches 52-91 are the removal of the linked list from the mm_struct.
> >
> > Patches 92-94 are to do with cleaning up the locking around the splitting of
> > VMAs using the maple tree state.
> >
> >
> > Liam R. Howlett (94):
> > mm: Add vma_lookup()
> > drm/i915/selftests: Use vma_lookup() in __igt_mmap()
> > arch/arc/kernel/troubleshoot: use vma_lookup() instead of
> > find_vma_intersection()
> > arch/arm64/kvm: Use vma_lookup() instead of find_vma_intersection()
> > arch/powerpc/kvm/book3s_hv_uvmem: Use vma_lookup() instead of
> > find_vma_intersection()
> > arch/powerpc/kvm/book3s: Use vma_lookup() in
> > kvmppc_hv_setup_htab_rma()
> > arch/mips/kernel/traps: Use vma_lookup() instead of
> > find_vma_intersection()
> > x86/sgx: Use vma_lookup() in sgx_encl_find()
> > virt/kvm: Use vma_lookup() instead of find_vma_intersection()
> > vfio: Use vma_lookup() instead of find_vma_intersection()
> > net/ipv5/tcp: Use vma_lookup() in tcp_zerocopy_receive()
> > drm/amdgpu: Use vma_lookup() in amdgpu_ttm_tt_get_user_pages()
> > media: videobuf2: Use vma_lookup() in get_vaddr_frames()
> > misc/sgi-gru/grufault: Use vma_lookup() in gru_find_vma()
> > kernel/events/uprobes: Use vma_lookup() in find_active_uprobe()
> > lib/test_hmm: Use vma_lookup() in dmirror_migrate()
> > mm/ksm: Use vma_lookup() in find_mergeable_vma()
> > mm/migrate: Use vma_lookup() in do_pages_stat_array()
> > mm/mremap: Use vma_lookup() in vma_to_resize()
> > mm/memory.c: Use vma_lookup() instead of find_vma_intersection()
> > radix tree test suite: Enhancements for Maple Tree
> > radix tree test suite: Add support for fallthrough attribute
> > radix tree test suite: Add support for kmem_cache_free_bulk
> > radix tree test suite: Add keme_cache_alloc_bulk() support
> > radix tree test suite: Add __must_be_array() support
> > Maple Tree: Add new data structure
> > mm: Start tracking VMAs with maple tree
> > mm/mmap: Introduce unlock_range() for code cleanup
> > mm/mmap: Change find_vma() to use the maple tree
> > mm/mmap: Change find_vma_prev() to use maple tree
> > mm/mmap: Change unmapped_area and unmapped_area_topdown to use maple
> > tree
> > kernel/fork: Convert dup_mmap to use maple tree
> > mm: Remove rb tree.
> > arch/m68k/kernel/sys_m68k: Use vma_lookup() in sys_cacheflush()
> > xen/privcmd: Optimized privcmd_ioctl_mmap() by using vma_lookup()
> > mm: Optimize find_exact_vma() to use vma_lookup()
> > mm/khugepaged: Optimize collapse_pte_mapped_thp() by using
> > vma_lookup()
> > mm/gup: Add mm_populate_vma() for use when the vma is known
> > mm/mmap: Change do_brk_flags() to expand existing VMA and add
> > do_brk_munmap()
> > mm/mmap: Change vm_brk_flags() to use mm_populate_vma()
> > mm: Change find_vma_intersection() to maple tree and make find_vma()
> > to inline.
> > mm/mmap: Change mmap_region() to use maple tree state
> > mm/mmap: Drop munmap_vma_range()
> > mm: Remove vmacache
> > mm/mmap: Change __do_munmap() to avoid unnecessary lookups.
> > mm/mmap: Move mmap_region() below do_munmap()
> > mm/mmap: Add do_mas_munmap() and wraper for __do_munmap()
> > mmap: Use find_vma_intersection in do_mmap() for overlap
> > mmap: Remove __do_munmap() in favour of do_mas_munmap()
> > mm/mmap: Change do_brk_munmap() to use do_mas_align_munmap()
> > mmap: make remove_vma_list() inline
> > mm: Introduce vma_next() and vma_prev()
> > arch/arm64: Remove mmap linked list from vdso.
> > arch/parisc: Remove mmap linked list from kernel/cache
> > arch/powerpc: Remove mmap linked list from mm/book3s32/tlb
> > arch/powerpc: Remove mmap linked list from mm/book3s64/subpage_prot
> > arch/s390: Use maple tree iterators instead of linked list.
> > arch/x86: Use maple tree iterators for vdso/vma
> > arch/xtensa: Use maple tree iterators for unmapped area
> > drivers/misc/cxl: Use maple tree iterators for cxl_prefault_vma()
> > drivers/tee/optee: Use maple tree iterators for __check_mem_type()
> > fs/binfmt_elf: Use maple tree iterators for fill_files_note()
> > fs/coredump: Use maple tree iterators in place of linked list
> > fs/exec: Use vma_next() instead of linked list
> > fs/proc/base: Use maple tree iterators in place of linked list
> > fs/proc/task_mmu: Stop using linked list and highest_vm_end
> > fs/userfaultfd: Stop using vma linked list.
> > ipc/shm: Stop using the vma linked list
> > kernel/acct: Use maple tree iterators instead of linked list
> > kernel/events/core: Use maple tree iterators instead of linked list
> > kernel/events/uprobes: Use maple tree iterators instead of linked list
> > kernel/sched/fair: Use maple tree iterators instead of linked list
> > kernel/sys: Use maple tree iterators instead of linked list
> > mm/gup: Use maple tree navigation instead of linked list
> > mm/huge_memory: Use vma_next() instead of vma linked list
> > mm/khugepaged: Use maple tree iterators instead of vma linked list
> > mm/ksm: Use maple tree iterators instead of vma linked list
> > mm/madvise: Use vma_next instead of vma linked list
> > mm/memcontrol: Stop using mm->highest_vm_end
> > mm/mempolicy: Use maple tree iterators instead of vma linked list
> > mm/mlock: Use maple tree iterators instead of vma linked list
> > mm/mprotect: Use maple tree navigation instead of vma linked list
> > mm/mremap: Use vma_next() instead of vma linked list
> > mm/msync: Use vma_next() instead of vma linked list
> > mm/oom_kill: Use maple tree iterators instead of vma linked list
> > mm/pagewalk: Use vma_next() instead of vma linked list
> > mm/swapfile: Use maple tree iterator instead of vma linked list
> > mm/util: Remove __vma_link_list() and __vma_unlink_list()
> > arch/um/kernel/tlb: Stop using linked list
> > bpf: Remove VMA linked list
> > mm: Remove vma linked list.
> > mm: Return a bool from anon_vma_interval_tree_verify()
> > mm/mmap: Add mas_split_vma() and use it for munmap()
> > mm: Move mas locking outside of munmap() path.
> >
> > Documentation/core-api/index.rst | 1 +
> > Documentation/core-api/maple-tree.rst | 36 +
> > MAINTAINERS | 12 +
> > arch/arc/kernel/troubleshoot.c | 8 +-
> > arch/arm64/kernel/vdso.c | 5 +-
> > arch/arm64/kvm/mmu.c | 2 +-
> > arch/m68k/kernel/sys_m68k.c | 4 +-
> > arch/mips/kernel/traps.c | 4 +-
> > arch/parisc/kernel/cache.c | 15 +-
> > arch/powerpc/kvm/book3s_hv.c | 4 +-
> > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +-
> > arch/powerpc/mm/book3s32/tlb.c | 5 +-
> > arch/powerpc/mm/book3s64/subpage_prot.c | 15 +-
> > arch/s390/configs/debug_defconfig | 1 -
> > arch/s390/mm/gmap.c | 8 +-
> > arch/um/kernel/tlb.c | 16 +-
> > arch/x86/entry/vdso/vma.c | 12 +-
> > arch/x86/kernel/cpu/sgx/encl.h | 4 +-
> > arch/x86/kernel/tboot.c | 2 +-
> > arch/xtensa/kernel/syscall.c | 4 +-
> > drivers/firmware/efi/efi.c | 2 +-
> > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 +-
> > .../drm/i915/gem/selftests/i915_gem_mman.c | 2 +-
> > drivers/media/common/videobuf2/frame_vector.c | 2 +-
> > drivers/misc/cxl/fault.c | 6 +-
> > drivers/misc/sgi-gru/grufault.c | 4 +-
> > drivers/tee/optee/call.c | 15 +-
> > drivers/vfio/vfio_iommu_type1.c | 2 +-
> > drivers/xen/privcmd.c | 2 +-
> > fs/binfmt_elf.c | 5 +-
> > fs/coredump.c | 13 +-
> > fs/exec.c | 9 +-
> > fs/proc/base.c | 7 +-
> > fs/proc/task_mmu.c | 45 +-
> > fs/proc/task_nommu.c | 55 +-
> > fs/userfaultfd.c | 43 +-
> > include/linux/maple_tree.h | 449 +
> > include/linux/mm.h | 62 +-
> > include/linux/mm_types.h | 33 +-
> > include/linux/mm_types_task.h | 5 -
> > include/linux/sched.h | 1 -
> > include/linux/sched/mm.h | 3 +
> > include/linux/vm_event_item.h | 4 -
> > include/linux/vmacache.h | 28 -
> > include/linux/vmstat.h | 6 -
> > include/trace/events/maple_tree.h | 227 +
> > include/trace/events/mmap.h | 71 +
> > init/main.c | 2 +
> > ipc/shm.c | 13 +-
> > kernel/acct.c | 8 +-
> > kernel/bpf/task_iter.c | 6 +-
> > kernel/debug/debug_core.c | 12 -
> > kernel/events/core.c | 7 +-
> > kernel/events/uprobes.c | 29 +-
> > kernel/fork.c | 50 +-
> > kernel/sched/fair.c | 14 +-
> > kernel/sys.c | 6 +-
> > lib/Kconfig.debug | 15 +-
> > lib/Makefile | 3 +-
> > lib/maple_tree.c | 6393 +++
> > lib/test_hmm.c | 5 +-
> > lib/test_maple_tree.c | 36152 ++++++++++++++++
> > mm/Makefile | 2 +-
> > mm/debug.c | 12 +-
> > mm/gup.c | 27 +-
> > mm/huge_memory.c | 4 +-
> > mm/init-mm.c | 4 +-
> > mm/internal.h | 51 +-
> > mm/interval_tree.c | 6 +-
> > mm/khugepaged.c | 13 +-
> > mm/ksm.c | 32 +-
> > mm/madvise.c | 2 +-
> > mm/memcontrol.c | 6 +-
> > mm/memory.c | 45 +-
> > mm/mempolicy.c | 44 +-
> > mm/migrate.c | 4 +-
> > mm/mlock.c | 26 +-
> > mm/mmap.c | 2271 +-
> > mm/mprotect.c | 7 +-
> > mm/mremap.c | 17 +-
> > mm/msync.c | 2 +-
> > mm/nommu.c | 135 +-
> > mm/oom_kill.c | 5 +-
> > mm/pagewalk.c | 2 +-
> > mm/swapfile.c | 9 +-
> > mm/util.c | 32 -
> > mm/vmacache.c | 117 -
> > mm/vmstat.c | 4 -
> > net/ipv4/tcp.c | 4 +-
> > tools/testing/radix-tree/.gitignore | 2 +
> > tools/testing/radix-tree/Makefile | 13 +-
> > tools/testing/radix-tree/generated/autoconf.h | 1 +
> > tools/testing/radix-tree/linux.c | 78 +-
> > tools/testing/radix-tree/linux/kernel.h | 10 +
> > tools/testing/radix-tree/linux/maple_tree.h | 7 +
> > tools/testing/radix-tree/linux/slab.h | 2 +
> > tools/testing/radix-tree/maple.c | 59 +
> > tools/testing/radix-tree/test.h | 1 +
> > .../radix-tree/trace/events/maple_tree.h | 8 +
> > virt/kvm/kvm_main.c | 2 +-
> > 100 files changed, 45347 insertions(+), 1699 deletions(-)
> > create mode 100644 Documentation/core-api/maple-tree.rst
> > create mode 100644 include/linux/maple_tree.h
> > delete mode 100644 include/linux/vmacache.h
> > create mode 100644 include/trace/events/maple_tree.h
> > create mode 100644 lib/maple_tree.c
> > create mode 100644 lib/test_maple_tree.c
> > delete mode 100644 mm/vmacache.c
> > create mode 100644 tools/testing/radix-tree/linux/maple_tree.h
> > create mode 100644 tools/testing/radix-tree/maple.c
> > create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h
> >
> > --
> > 2.30.2
More information about the maple-tree
mailing list