[PATCH v9 28/69] mm/mmap: reorganize munmap to use maple states

Fri Jun 17 06:49:59 PDT 2022

* Yu Zhao <yuzhao at google.com> [220616 14:35]:
> On Thu, Jun 16, 2022 at 12:27 PM Liam Howlett <liam.howlett at oracle.com> wrote:
> >
> > * Yu Zhao <yuzhao at google.com> [220616 01:56]:
> > > On Wed, Jun 15, 2022 at 11:45 PM Yu Zhao <yuzhao at google.com> wrote:
> > > >
> > > > On Wed, Jun 15, 2022 at 9:02 PM Yu Zhao <yuzhao at google.com> wrote:
> > > > >
> > > > > On Wed, Jun 15, 2022 at 8:56 PM Liam Howlett <liam.howlett at oracle.com> wrote:
> > > > > >
> > > > > > * Yu Zhao <yuzhao at google.com> [220615 21:59]:
> > > > > > > On Wed, Jun 15, 2022 at 7:50 PM Liam Howlett <liam.howlett at oracle.com> wrote:
> > > > > > > >
> > > > > > > > * Yu Zhao <yuzhao at google.com> [220615 17:17]:
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > > > Yes, I used the same parameters with 512GB of RAM, and the kernel with
> > > > > > > > > > KASAN and other debug options.
> > > > > > > > >
> > > > > > > > > Sorry, Liam. I got the same crash :(
> > > > > > > >
> > > > > > > > Thanks for running this promptly.  I am trying to get my own server
> > > > > > > > setup now.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 9d27f2f1487a (tag: mm-everything-2022-06-14-19-05, akpm/mm-everything)
> > > > > > > > > 00d4d7b519d6 fs/userfaultfd: Fix vma iteration in mas_for_each() loop
> > > > > > > > > 55140693394d maple_tree: Make mas_prealloc() error checking more generic
> > > > > > > > > 2d7e7c2fcf16 maple_tree: Fix mt_destroy_walk() on full non-leaf non-alloc nodes
> > > > > > > > > 4d4472148ccd maple_tree: Change spanning store to work on larger trees
> > > > > > > > > ea36bcc14c00 test_maple_tree: Add tests for preallocations and large
> > > > > > > > > spanning writes
> > > > > > > > > 0d2aa86ead4f mm/mlock: Drop dead code in count_mm_mlocked_page_nr()
> > > > > > > > >
> > > > > > > > > ==================================================================
> > > > > > > > > BUG: KASAN: slab-out-of-bounds in mab_mas_cp+0x2d9/0x6c0
> > > > > > > > > Write of size 136 at addr ffff88c35a3b9e80 by task stress-ng/19303
> > > > > > > > >
> > > > > > > > > CPU: 66 PID: 19303 Comm: stress-ng Tainted: G S        I       5.19.0-smp-DEV #1
> > > > > > > > > Call Trace:
> > > > > > > > >  <TASK>
> > > > > > > > >  dump_stack_lvl+0xc5/0xf4
> > > > > > > > >  print_address_description+0x7f/0x460
> > > > > > > > >  print_report+0x10b/0x240
> > > > > > > > >  ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > >  kasan_report+0xe6/0x110
> > > > > > > > >  ? mast_spanning_rebalance+0x2634/0x29b0
> > > > > > > > >  ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > >  kasan_check_range+0x2ef/0x310
> > > > > > > > >  ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > >  ? mab_mas_cp+0x2d9/0x6c0
> > > > > > > > >  memcpy+0x44/0x70
> > > > > > > > >  mab_mas_cp+0x2d9/0x6c0
> > > > > > > > >  mas_spanning_rebalance+0x1a3e/0x4f90
> > > > > > > >
> > > > > > > > Does this translate to an inline around line 2997?
> > > > > > > > And then probably around 2808?
> > > > > > >
> > > > > > > $ ./scripts/faddr2line vmlinux mab_mas_cp+0x2d9
> > > > > > > mab_mas_cp+0x2d9/0x6c0:
> > > > > > > mab_mas_cp at lib/maple_tree.c:1988
> > > > > > > $ ./scripts/faddr2line vmlinux mas_spanning_rebalance+0x1a3e
> > > > > > > mas_spanning_rebalance+0x1a3e/0x4f90:
> > > > > > > mast_cp_to_nodes at lib/maple_tree.c:?
> > > > > > > (inlined by) mas_spanning_rebalance at lib/maple_tree.c:2997
> > > > > > > $ ./scripts/faddr2line vmlinux mas_wr_spanning_store+0x16c5
> > > > > > > mas_wr_spanning_store+0x16c5/0x1b80:
> > > > > > > mas_wr_spanning_store at lib/maple_tree.c:?
> > > > > > >
> > > > > > > No idea why faddr2line didn't work for the last two addresses. GDB
> > > > > > > seems more reliable.
> > > > > > >
> > > > > > > (gdb) li *(mab_mas_cp+0x2d9)
> > > > > > > 0xffffffff8226b049 is in mab_mas_cp (lib/maple_tree.c:1988).
> > > > > > > (gdb) li *(mas_spanning_rebalance+0x1a3e)
> > > > > > > 0xffffffff822633ce is in mas_spanning_rebalance (lib/maple_tree.c:2801).
> > > > > > > quit)
> > > > > > > (gdb) li *(mas_wr_spanning_store+0x16c5)
> > > > > > > 0xffffffff8225cfb5 is in mas_wr_spanning_store (lib/maple_tree.c:4030).
> > > > > >
> > > > > >
> > > > > > Thanks.  I am not having luck recreating it.  I am hitting what looks
> > > > > > like an unrelated issue in the unstable mm, "scheduling while atomic".
> > > > > > I will try the git commit you indicate above.
> > > > >
> > > > > Fix here:
> > > > > https://lore.kernel.org/linux-mm/20220615160446.be1f75fd256d67e57b27a9fc@linux-foundation.org/
> > > >
> > > > A seemingly new crash on arm64:
> > > >
> > > > KASAN: null-ptr-deref in range [0x0000000000000000-0x000000000000000f]
> > > > Call trace:
> > > >  __hwasan_check_x2_67043363+0x4/0x34
> > > >  mas_wr_store_entry+0x178/0x5c0
> > > >  mas_store+0x88/0xc8
> > > >  dup_mmap+0x4bc/0x6d8
> > > >  dup_mm+0x8c/0x17c
> > > >  copy_mm+0xb0/0x12c
> > > >  copy_process+0xa44/0x17d4
> > > >  kernel_clone+0x100/0x2cc
> > > >  __arm64_sys_clone+0xf4/0x120
> > > >  el0_svc_common+0xfc/0x1cc
> > > >  do_el0_svc_compat+0x38/0x5c
> > > >  el0_svc_compat+0x68/0xf4
> > > >  el0t_32_sync_handler+0xc0/0xf0
> > > >  el0t_32_sync+0x190/0x194
> > > > Code: aa0203e0 d2800441 141e931d 9344dc50 (38706930)
> > >
> > > And bad rss counters from another arm64 machine:
> > >
> > > BUG: Bad rss-counter state mm:a6ffff80895ff840 type:MM_ANONPAGES val:4
> > > Call trace:
> > >  __mmdrop+0x1f0/0x208
> > >  __mmput+0x194/0x198
> > >  mmput+0x5c/0x80
> > >  exit_mm+0x108/0x190
> > >  do_exit+0x244/0xc98
> > >  __arm64_sys_exit_group+0x0/0x30
> > >  __wake_up_parent+0x0/0x48
> > >  el0_svc_common+0xfc/0x1cc
> > >  do_el0_svc_compat+0x38/0x5c
> > >  el0_svc_compat+0x68/0xf4
> > >  el0t_32_sync_handler+0xc0/0xf0
> > >  el0t_32_sync+0x190/0x194
> > > Code: b000b520 91259c00 aa1303e1 94482015 (d4210000)
> > >
> >
> > What was the setup for these two?  I'm running trinity, but I suspect
> > you are using stress-ng?
> 
> That's correct.
> 
> > If so, what are the arguments?  My arm64 vm is
> > even lower memory than my x86_64 vm so I will probably have to adjust
> > accordingly.
> 
> I usually lower the N for `-a N`.

I'm still trying to reproduce any of these bugs you are seeing.  I sent
out two fixes that I cc'ed you on that may help at least the last one
here.  My thinking is there isn't enough pre-allocation happening and so
I am missing some of the munmap events.  I fixed this by not
pre-allocating the side tree and return -ENOMEM instead.  This is safe
since munmap can allocate anyways for splits.