[PATCH v6 00/71] Introducing the Maple Tree

Vasily Gorbik gor at linux.ibm.com
Sat Feb 26 18:22:57 PST 2022


On Tue, Feb 15, 2022 at 02:37:44PM +0000, Liam Howlett wrote:
> The maple tree is an RCU-safe range based B-tree designed to use modern
> processor cache efficiently.  There are a number of places in the kernel
> that a non-overlapping range-based tree would be beneficial, especially
> one with a simple interface.  The first user that is covered in this
> patch set is the vm_area_struct, where three data structures are
> replaced by the maple tree: the augmented rbtree, the vma cache, and the
> linked list of VMAs in the mm_struct.  The long term goal is to reduce
> or remove the mmap_sem contention.
> 
> The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
> nodes.  With the increased branching factor, it is significantly shorter than
> the rbtree so it has fewer cache misses.  The removal of the linked list
> between subsequent entries also reduces the cache misses and the need to pull
> in the previous and next VMA during many tree alterations.
> 
> This patch is based on v5.17-rc4
> 
> git: https://github.com/oracle/linux-uek/tree/howlett/maple/20220214

Hi Liam,

this patch series completely breaks s390. Besides endianess issue
in maple trees reported here:

https://lore.kernel.org/all/your-ad-here.call-01645924312-ext-0398@work.hours/

and with this endianess issue fixed (fixup from here ^^^) we still get numerous
KASAN reports starting with

[PATCH v6 10/71] mm: Start tracking VMAs with maple tree

mostly looking like this:

 BUG: KASAN: use-after-free in mas_descend_adopt+0x113a/0x1408
 Read of size 8 at addr 00000000b1d4e900 by task cat/610

 CPU: 34 PID: 610 Comm: cat Tainted: G        W         5.17.0-rc4-91313-g592c3b299aad #1
 Hardware name: IBM 3906 M04 704 (LPAR)
 Call Trace:
  [<000000000234647a>] dump_stack_lvl+0xfa/0x150
  [<0000000002328154>] print_address_description.constprop.0+0x64/0x340
  [<00000000008f5ba6>] kasan_report+0x13e/0x1a8
  [<00000000017f0c12>] mas_descend_adopt+0x113a/0x1408
  [<00000000018076ec>] mas_spanning_rebalance.isra.0+0x5164/0x6a20
  [<000000000180a71e>] mas_wr_spanning_store.isra.0+0x476/0xbc8
  [<00000000018189bc>] mas_store_gfp+0xd4/0x188
  [<0000000000827bae>] vma_mt_szero+0x146/0x368
  [<000000000082e990>] __do_munmap+0x340/0xe20
  [<000000000082f580>] __vm_munmap+0x110/0x1e8
  [<000000000082f76e>] __s390x_sys_munmap+0x6e/0x90
  [<000000000010dc6c>] do_syscall+0x22c/0x328
  [<000000000234d3d2>] __do_syscall+0x9a/0xf8
  [<0000000002374ed2>] system_call+0x82/0xb0
 INFO: lockdep is turned off.

 Allocated by task 610:
  kasan_save_stack+0x34/0x58
  __kasan_slab_alloc+0x84/0xa8
  kmem_cache_alloc+0x20c/0x520
  mas_alloc_nodes+0x26a/0x4c8
  mas_split.isra.0+0x2aa/0x1418
  mas_wr_modify+0x3fa/0xd28
  mas_store_gfp+0xd4/0x188
  vma_store+0x17a/0x3d8
  vma_link+0xac/0x798
  mmap_region+0xa5a/0x10b8
  do_mmap+0x7c2/0xa90
  vm_mmap_pgoff+0x186/0x250
  ksys_mmap_pgoff+0x334/0x400
  __s390x_sys_old_mmap+0xf4/0x130
  do_syscall+0x22c/0x328
  __do_syscall+0x9a/0xf8
  system_call+0x82/0xb0
 Freed by task 610:
  kasan_save_stack+0x34/0x58
  kasan_set_track+0x36/0x48
  kasan_set_free_info+0x34/0x58
  ____kasan_slab_free+0x11c/0x188
  __kasan_slab_free+0x24/0x30
  kmem_cache_free_bulk.part.0+0xec/0x538
  mas_destroy+0x2e4/0x710
  mas_store_gfp+0xf4/0x188
  vma_mt_szero+0x146/0x368
  __vma_adjust+0x155a/0x2188
  __split_vma+0x228/0x450
  mprotect_fixup+0x4f2/0x618
  do_mprotect_pkey.constprop.0+0x328/0x600
  __s390x_sys_mprotect+0x8e/0xb8
  do_syscall+0x22c/0x328
  __do_syscall+0x9a/0xf8
  system_call+0x82/0xb0

 The buggy address belongs to the object at 00000000b1d4e900
  which belongs to the cache maple_node of size 256
 The buggy address is located 0 bytes inside of
  256-byte region [00000000b1d4e900, 00000000b1d4ea00)
 The buggy address belongs to the page:
 page:0000400002c75200 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xb1d48
 head:0000400002c75200 order:3 compound_mapcount:0 compound_pincount:0
 flags: 0x3ffff00000010200(slab|head|node=0|zone=1|lastcpupid=0x1ffff)
 raw: 3ffff00000010200 0000400002c91e08 000040000263b608 000000008009ed00
 raw: 0000000000000000 0020004000000000 ffffffff00000001 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  00000000b1d4e800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  00000000b1d4e880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 >00000000b1d4e900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                    ^
  00000000b1d4e980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  00000000b1d4ea00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

with eventual crash.

How come none of s390 architecture maintainers nor s390 mailing list were
on CC for the changes which affect s390 to that magnitude? None of this
has apparently been tested on s390 or any other big endian system. And
a shoulder tap to give it try would be helpful.

Now we are just starting looking at the problems. And until issues
are resolved this patch series has to be dropped from linux-next.



More information about the maple-tree mailing list